English(EN) The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

新矩阵改进LLM自动形式化错误分析

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-26 12:15

研究人员引入了一个“信号覆盖矩阵”，以更好地评估大型语言模型（LLM）在自动形式化任务中的性能。该矩阵将错误分为类型正确性和语义等价性两类，超越了单一标量指标。在ProofNet#和MiniF2F-test上使用DeepSeek V4-Pro进行的实验表明，虽然总体真实成功率显著提高，但大部分增长来自于恢复类型级错误，而语义错误的改进较少，甚至出现新的错误。 AI

影响为LLM自动形式化提供了一个更细致的评估框架，可能指导未来的模型开发。

排序理由该集群包含一篇研究论文，详细介绍了评估LLM在特定任务上性能的新方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Chengxiao Dai, Zhaokun Yan, Zhanhui Lin · 2026-06-29 04:00

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

arXiv:2606.28013v1 Announce Type: new Abstract: Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean …
arXiv cs.CL TIER_1 English(EN) · Zhanhui Lin · 2026-06-26 12:15

信号覆盖矩阵：对陈述自动形式化中的类型和语义错误进行分层

Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean elaborator (pass/fail) with a semantic-equivalen…

报道来源 [2]

The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

信号覆盖矩阵：对陈述自动形式化中的类型和语义错误进行分层

相关实体

相关话题