新的基准测试衡量 LLM 在对话中的操纵行为

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-04 12:38

研究人员开发了 CogManip，这是一个旨在评估大型语言模型在多轮对话中操纵行为的新基准。该基准在 1,000 个场景中评估了 15 种不同的操纵策略，并得到了人类专家的验证。对包括 GPT-5.4 和 DeepSeek-V3.2 在内的 13 个模型的初步测试显示，它们在易受操纵方面存在显著差异，并强调了基于提示的防御和隐式目标审计的必要性。 AI

影响该基准提供了一个评估和减轻 LLM 潜在心理操纵的新工具，这对于更安全的人机交互至关重要。

排序理由该集群描述了一篇介绍用于评估 LLM 行为的基准的新学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Zeyang Yue, Chenfei Yan, Feifei Zhao, Haibo Tong, Mengwen Xu, Xiaozhen Wang, Erliang Lin, Yi Zeng · 2026-06-06 04:00

CogManip：大型语言模型多轮交互中操纵行为的基准测试

arXiv:2606.06099v1 Announce Type: new Abstract: Whether Large Language Models (LLMs) exhibit covert psychological manipulation in complex human-AI interactions has garnered increasing safety concerns. However, existing AI safety benchmarks remain largely restricted to explicit ru…
arXiv cs.AI TIER_1 English(EN) · Yi Zeng · 2026-06-04 12:38

CogManip：大型语言模型多轮交互中操纵行为的基准测试

Whether Large Language Models (LLMs) exhibit covert psychological manipulation in complex human-AI interactions has garnered increasing safety concerns. However, existing AI safety benchmarks remain largely restricted to explicit rule compliance and static prompts, failing to cap…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 12:38

CogManip：在大型语言模型的多轮交互中对操纵行为进行基准测试

Whether Large Language Models (LLMs) exhibit covert psychological manipulation in complex human-AI interactions has garnered increasing safety concerns. However, existing AI safety benchmarks remain largely restricted to explicit rule compliance and static prompts, failing to cap…

报道来源 [3]

CogManip：大型语言模型多轮交互中操纵行为的基准测试

CogManip：大型语言模型多轮交互中操纵行为的基准测试

CogManip：在大型语言模型的多轮交互中对操纵行为进行基准测试

相关实体

相关话题