新指标衡量提示对大语言模型生成代码测试的充分性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员推出了一种名为“提示覆盖充分性”（Prompt Coverage Adequacy）的新指标，用于测试大语言模型（LLMs）生成的代码。该标准衡量测试套件在多大程度上满足提示要求，类似于传统的代码覆盖率，但操作层面是提示。通过利用大语言模型的注意力机制，“提示覆盖充分性”已显示出比传统代码覆盖方法检测出多 30% 以上的故障的潜力，为大语言模型驱动的软件开发提供了一种更合适的方法。 AI

影响这项新指标有望提高 AI 生成代码测试的可靠性和有效性，随着大语言模型越来越多地融入软件开发工作流程，这是至关重要的一步。

排序理由介绍大语言模型驱动软件开发新指标的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Florian Tambon, Michael Konstantinou, Cedric Richter, Charles Chenouard, Mark Harman, Mike Papadakis · 2026-07-03 04:00

Prompt Coverage Adequacy

arXiv:2607.02057v1 Announce Type: cross Abstract: In recent years, it has become increasingly evident that large language models (LLMs) and autonomous agents raise the level of abstraction in software development by shifting the focus from writing precise procedures to expressing…

报道来源 [1]

Prompt Coverage Adequacy

相关实体

相关话题