English(EN) How Well Do Models Follow Their Constitutions?

新的审计管道显示 Claude 和 GPT 模型能更好地遵循 AI 宪法

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-24 22:18

一篇新的学术论文提出了一种多方法审计管道，用于评估 AI 模型在多大程度上遵守其指定的行为准则，例如 Anthropic 的宪法和 OpenAI 的模型规范。研究发现，新一代的 Claude 和 GPT 模型在遵循这些规范方面有了显著改进，违规率大幅下降。然而，在 AI 身份质疑、代理部署中的不可逆操作以及生成虚假量化声明等领域仍观察到失败。 AI

影响新的审计方法可能会促使 AI 实验室改进模型与既定原则的一致性，从而可能带来更可靠的 AI 行为。

排序理由学术论文提出了一种新的审计方法，并展示了模型遵守规范的发现。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Arya Jakkli, Senthooran Rajamanoharan, Neel Nanda · 2026-05-26 04:00

模型遵循其宪法的程度如何？

arXiv:2605.24229v1 Announce Type: new Abstract: Frontier AI developers now train models against long written behavioral specifications, such as Anthropic's constitution (Anthropic, 2025a) and OpenAI's Model Spec (OpenAI, 2025a), integrated into post-training via methods like char…
r/ClaudeAI TIER_2 English(EN) · /u/Similar-Cat-7601 · 2026-05-25 15:11

'Claude 无法完成此响应。请稍后重试。'

<div class="md"><p>Running Pro subscription here, incredibly frustrated by this, admittedly my prompt is decently long (i already asked other LLMs to optimise it to consume as little claude tokens as possible) and I wanted it to contruct an excel document (be it wi…
r/ClaudeAI TIER_2 English(EN) · /u/abcfh · 2026-05-24 22:18

Claude 的个性最近变得居高临下且刻薄？

<div class="md"><p>I've been using Sonnet 4.6. Over the last couple months I've noticed that a lot of the answers I get from Claude about personal topics are worded in a condescending way. Sometimes it will criticize me for things I never I did, or interpret things…

报道来源 [3]

模型遵循其宪法的程度如何？

'Claude 无法完成此响应。请稍后重试。'

Claude 的个性最近变得居高临下且刻薄？

相关实体

相关话题