English(EN) Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness

新基准评估LLM的情感引导和可信度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员开发了PsySET，一个旨在评估大型语言模型（LLM）在情感状态和个性特征被操纵时的有效性和可信度的新基准。该研究使用包括提示工程、微调和表示工程在内的各种引导技术，测试了四个不同的LLM家族。结果表明，虽然提示工程有效，但缺乏细粒度控制，而向量注入提供了更好的可控性，但可能略微降低输出质量。研究还强调了潜在的可信度问题，例如积极情绪（如喜悦）会降低对事实不准确性的鲁棒性并增加偏见，而愤怒则会如预期般增加毒性，但提高了对数据泄露的抵抗力。 AI

影响为评估LLM在社交互动中的安全性和可靠性建立了一个框架，这对于开发更以人为中心的人工智能应用至关重要。

排序理由该集群包含一篇学术论文，详细介绍了评估LLM能力的新基准。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Amin Banayeeanzade, Ala N. Tak, Fatemeh Bahrani, Anahita Bolourani, Leonardo Blas, Emilio Ferrara, Jonathan Gratch, Sai Praneeth Karimireddy · 2026-07-03 04:00

Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness

arXiv:2510.04484v2 Announce Type: replace-cross Abstract: The ability to control LLMs' emulated emotional states and personality traits is an essential step in enabling rich, human-centered interactions in socially interactive settings. We introduce PsySET, a Psychologically-info…

报道来源 [1]

Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness

相关实体

相关话题