English(EN) Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium

研究发现，基于奖励模型的LLM对齐面临统计上的不可能性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-04 04:00

一篇新论文探讨了大语言模型（LLM）与多样化人类偏好对齐所面临的统计挑战。研究人员证明，由于人类偏好中普遍存在孔多塞循环，现有的基于奖励的对齐方法（如人类反馈强化学习）在统计上是不可能的。然而，该研究也表明，非基于奖励的方法（如纳什学习）可以通过使LLM使用混合策略，在统计上保留少数派偏好。 AI

影响强调了当前LLM对齐方法的理论局限性，并提出了保留多样化偏好的替代方法。

排序理由关于LLM对齐理论的学术论文。

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Kaizhao Liu, Qi Long, Zhekun Shi, Weijie J. Su, Jiancong Xiao · 2026-05-04 04:00

Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium

arXiv:2503.10990v2 Announce Type: replace-cross Abstract: Aligning large language models (LLMs) with diverse human preferences is critical for ensuring fairness and informed outcomes when deploying these models for decision-making. In this paper, we seek to uncover fundamental st…