English(EN) The Necessity of Setting Temperature in LLM-as-a-Judge

LLM 评委的温度设置影响一致性和探索性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-08 04:00

一篇新发表在 arXiv 上的研究调查了在将大型语言模型（LLMs）用作评委评估其他模型输出时，解码温度对其性能的影响。研究表明，较高的温度可能导致一致性下降和格式错误增加，但也能揭示潜在的不确定性，这在复杂的评估场景中可能是有益的。研究结果表明，温度应该是一个依赖于任务的选择，在可靠性和探索性之间取得平衡，而不是一个固定的超参数。 AI

影响为优化 LLM 作为评委的设置提供了指导，以获得更可靠和更有洞察力的模型评估。

排序理由关于 LLM 评估方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Lujun Li, Lama Sleem, Yangjie Xu, Yewei Song, Aolin Jia, Jerome Francois, Radu State · 2026-06-08 04:00

LLM-as-a-Judge 中设置温度的必要性

arXiv:2603.28304v2 Announce Type: replace Abstract: Using large language models (LLMs) as judges for evaluating model outputs has emerged as an important paradigm for automated evaluation. However, the choice of decoding temperature in LLM-as-a-judge settings is still largely cho…

报道来源 [1]

LLM-as-a-Judge 中设置温度的必要性

相关实体

相关话题