English(EN) Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading

LLM评估框架在无提示优化时可能产生误导

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-01 04:00

Nicholas Sadjoli 的一篇新论文认为，当前的大型语言模型 (LLM) 评估框架具有误导性，因为它们对所有模型使用静态提示。研究表明，在行业中常用的最大化性能的提示优化 (PO) 技术会显著改变模型排名。研究结果强调，从业者在评估特定任务的 LLM 时，必须进行每种模型的提示优化。 AI

影响强调了当前 LLM 基准测试中潜在的不准确性，并强调了准确选择模型需要进行特定任务的提示调整。

排序理由在 arXiv 上发表的关于 LLM 评估方法的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Nicholas Sadjoli, Tim Siefken, Atin Ghosh, Yifan Mai, Daniel Dahlmeier · 2026-05-01 04:00

优化优先于评估：使用未优化提示进行评估可能产生误导

arXiv:2604.27637v1 Announce Type: new Abstract: Current Large Language Model (LLM) evaluation frameworks utilize the same static prompt template across all models under evaluation. This differs from the common industry practice of using prompt optimization (PO) techniques to opti…

报道来源 [1]

优化优先于评估：使用未优化提示进行评估可能产生误导

相关实体

相关话题