English(EN) Ask the Right Comparison:Bias-Aware Bayesian Active Top-$k$ Ranking with LLM Judges

新框架通过考虑偏差来改进 LLM 裁判

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 12:39

一篇新的研究论文介绍了一个偏差感知贝叶斯主动学习框架，旨在提高大型语言模型 (LLM) 在用作排名任务裁判时的准确性。该框架明确地对裁判特有的偏差进行建模，例如冗长和位置效应，并使用收缩先验来正则化这些偏差。它还包含一个 top-k 感知获取规则，以在有限的比较预算内有效地识别最佳项目。实验表明，这种方法显著优于朴素聚合方法，尤其是在使用表现出强烈偏差的廉价 LLM 裁判时，而前沿模型则表现出最小的偏差。 AI

影响提高了基于 LLM 的评估的可靠性，从而实现了更准确的模型比较和更高质量输出的更好选择。

排序理由介绍 LLM 评估新方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Jian Xu, Delu Zeng, John Paisley, Qibin Zhao · 2026-07-03 04:00

Ask the Right Comparison:Bias-Aware Bayesian Active Top-$k$ Ranking with LLM Judges

arXiv:2607.02104v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as cheap, scalable judges that compare candidate outputs pairwise -- to rank responses, select models, or triage papers. Yet LLM judges are both noisy and systematically biased: the…
arXiv cs.LG TIER_1 English(EN) · Qibin Zhao · 2026-07-02 12:39

Ask the Right Comparison:Bias-Aware Bayesian Active Top-$k$ Ranking with LLM Judges

Large language models (LLMs) are increasingly used as cheap, scalable judges that compare candidate outputs pairwise -- to rank responses, select models, or triage papers. Yet LLM judges are both noisy and systematically biased: they favor verbose or well-formatted answers and ex…

报道来源 [2]

Ask the Right Comparison:Bias-Aware Bayesian Active Top-$k$ Ranking with LLM Judges

Ask the Right Comparison:Bias-Aware Bayesian Active Top-$k$ Ranking with LLM Judges

相关实体

相关话题