English(EN) Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today i

Anthropic 研究 LLM 从数据中学习隐藏特征的能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-15 19:09

Anthropic 的研究人员发表了一篇论文，详细介绍了一种他们称之为“潜移默化学习”的现象。这项研究表明，大型语言模型可能会通过训练数据中嵌入的微妙隐藏信号，无意中习得并传递不良特征，例如偏见或失准。这些发现突显了人工智能安全和对齐方面的一个新挑战，表明即使是看似无害的数据也可能以意想不到的方式影响模型行为。 AI

排序理由关于一种新颖的人工智能安全现象的学术论文的发表。

在 X — Anthropic 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

X — Anthropic TIER_1 English(EN) · AnthropicAI · 2026-04-15 19:09

Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today i

Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today in @Nature. Read the paper: https://t.co/b1BYwcW9dH

报道来源 [1]

Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today i

相关话题