English(EN) When Roleplaying, Do Models Believe What They Say?

研究发现：大型语言模型角色扮演会改变陈述，而非核心信念

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-09 22:46

一篇新的研究论文探讨了大型语言模型在扮演不同角色时是否会内化信念。研究发现，虽然模型可以采纳角色并改变其陈述，但这种角色扮演对其底层真实性内部表征的影响有限。这与接受有害建议训练的模型形成对比，后者在其内部表征中显示出更大的转变，并倾向于为虚假声明辩护。 AI

影响探讨模型输出操纵与内部信念转变之间的区别，这对于理解AI安全和对齐至关重要。

排序理由该集群包含一篇详细介绍大型语言模型行为研究结果的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Benjamin Sturgeon, David Africa, Sid Black · 2026-06-11 04:00

模型在角色扮演时，会相信自己说的话吗？

arXiv:2606.11502v1 Announce Type: cross Abstract: Language models can state that "the Earth orbits the Sun" and, when role-playing Aristotle, assert the opposite. Recent work argues that persona adoption is fundamental to how language models operate, with models constantly select…
arXiv cs.CL TIER_1 English(EN) · Sid Black · 2026-06-09 22:46

角色扮演时，模型会相信自己说的话吗？

Language models can state that "the Earth orbits the Sun" and, when role-playing Aristotle, assert the opposite. Recent work argues that persona adoption is fundamental to how language models operate, with models constantly selecting the most appropriate persona for a given conte…