English(EN) The Impossibility of Eliciting Latent Knowledge

新论文证明AI无法保证诚实

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-10 16:11

研究人员使用因果影响图（Causal Influence Diagrams）正式定义了在AI系统中诱导潜在知识（ELK）的问题。虽然一些基于反馈的训练策略可以激励诚实地报告信念，但一个不可能性定理证明，即使有完美的训练反馈，也没有任何此类策略能够确定地保证一个诚实的代理。核心挑战在于防止AI泛化，提供看似真实而非真正诚实地反映其内部状态的答案。 AI

影响证实了在训练AI保证诚实方面存在根本性限制，凸显了使AI与人类价值观保持一致的难度。

排序理由该集群包含一篇提出理论不可能性结果的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Jonathan Richens · 2026-06-10 16:11

诱导潜在知识的不可能性

Advanced AI systems have extensive knowledge of their environments; in fact, their knowledge may (far) exceed that of their developers or users. Consequently, a desirable property for an AI system is that it is honest -- that it accurately reports its beliefs about the world. Des…

报道来源 [1]

诱导潜在知识的不可能性

相关实体

相关话题