English(EN) Measuring Semantic Progress in Multi-turn Dialogue via Information Gain

新指标使用信息增益衡量对话中的语义进展

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-10 17:04

研究人员开发了一种新的指标来评估多轮对话的质量，通过衡量语义进展。该指标量化了对话轮次中新颖、相关且不冗余信息的累积，将其构建为条件不确定性减少。该方法使用嵌入空间中近似的信息论度量，为基于 LLM 的评估方法提供了一种可重现且高效的替代方案。实验表明，该方法与人类判断具有可比的一致性，尤其是在 MT-Bench 和 UltraFeedback 等基准测试上，并且可以在仅 CPU 的系统上运行。 AI

影响为评估对话 AI 提供了一种更客观、可重现的方法，有望改善模型开发和用户体验。

排序理由该集群包含一篇介绍对话系统新评估指标的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Dominik Janzing · 2026-06-10 17:04

通过信息增益衡量多轮对话中的语义进展

Evaluating multi-turn dialogue is challenging because quality emerges across turns rather than within individual responses. We focus on a key dimension of information-seeking dialogue: semantic progress, defined as the accumulation of new, question-relevant, and non-redundant inf…

报道来源 [1]

通过信息增益衡量多轮对话中的语义进展

相关实体

相关话题