English(EN) Self-Critique Loops for Agents: Where the 3rd Iteration Stops Helping

LLM 在没有外部反馈的情况下难以实现可靠的自我纠正

作者 PulseAugur 编辑部 · [7 个来源] · 2026-04-30 21:59

近期研究表明，大型语言模型在可靠的自我纠正方面存在困难，尤其是在没有外部反馈的情况下试图修改自己的推理时。对 Self-Refine 和 Cannot-Self-Correct 等方法的研究表明，模型最初的置信度经常会延续到修改中，从而可能降低性能。虽然 Reflexion 等方法通过外部成功/失败信号来控制自我纠正，提供了一种部分解决方案，但它们并非万无一失，如果信号不可靠，仍可能导致错误。自我纠正的有效性在一两次迭代后也会迅速下降，后续的迭代可能会引入新的错误或过度编辑正确的响应。 AI

影响 LLM 中的自我纠正循环不如之前认为的有效，尤其是在没有外部验证的情况下，这限制了它们在自主代理中的效用。

排序理由该集群包含多篇研究论文和博客文章，讨论了 LLM 自我纠正机制的局限性。

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 7 个来源。我们如何撰写摘要 →

报道来源 [7]

Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-17 05:17

Reflexion 将自我纠错一分为二：一个用于检测成功/失败的评估器，以及一个用于诊断出错原因的自我反思模型。评估器的

Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that diagnoses what went wrong. The Evaluator's external signal — heuristic, exact-match, or test execution — gates whether diagnosis fires. When that signal misfires, …

链接 benjaminhan.net/…/20260516-reflexion
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-17 05:17

Cannot-Self-Correct 测试了大型语言模型（LLM）在没有任何关于正确性的外部信号的情况下，能够自行修改其推理答案的强有力主张。在三个基准测试中

Cannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal about correctness. Across three benchmarks (GSM8K, CommonSenseQA, HotPotQA), the answer is no: the model's confidence carries over from the initial answer into …

链接 benjaminhan.net/…/20260516-cannot-self-co…
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-17 05:16

在Self-Refine中，单个冻结的LLM在仅提示的循环中充当生成器、批评者和重写者，论文报告了平均约20个点的提升

In Self-Refine, a single frozen LLM acts as generator, critic, and rewriter in a prompt-only loop, and the paper reports about 20 points of average lift across seven tasks without any training, RL, or external signal. The gains vary widely by task: small on math reasoning, but la…

链接 benjaminhan.net/…/20260516-self-refine
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-17 05:16

这篇三篇论文的系列文章探讨了大型语言模型是否能可靠地自我纠正推理过程。Self-Refine提出了一种朴素的内在反馈循环，并报告了令人印象深刻的成果

This is a 3-paper arc on whether LLMs can reliably self-correct their own reasoning. Self-Refine proposes a naive intrinsic-feedback loop and reports impressive gains. Cannot-Self-Correct refutes empirically the class of approach Self-Refine belongs to. Reflexion threads the need…
dev.to — LLM tag TIER_1 English(EN) · Gabriel Anhaia · 2026-05-07 19:02

Self-Critique Loops for Agents: Where the 3rd Iteration Stops Helping

<ul> <li> Book: <a href="https://www.amazon.com/dp/B0GYJZ2XJD" rel="noopener noreferrer">AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs</a> </li> <li> Also by me: Thinking in Go (2-book series) — <a href="http…
Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-04-30 22:02

情绪概念及其在大型语言模型中的功能“大型语言模型（LLM）有时似乎会表现出情绪反应。我们正在调查原因

Emotion Concepts and their Function in a Large Language Model "Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal represe…

链接 transformer-circuits.pub/…/index.html
Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-04-30 21:59

情绪概念及其在大型语言模型中的作用 \ Anthropic "所有现代语言模型有时都表现得好像有情绪。它们可能会说它们

Emotion concepts and their function in a large language model \ Anthropic "All modern language models sometimes act like they have emotions. They may say they’re happy to help you, or sorry when they make a mistake. Sometimes they even appear to become frustrated or anxious when …

链接 anthropic.com/…/emotion-concepts-function

报道来源 [7]

Reflexion 将自我纠错一分为二：一个用于检测成功/失败的评估器，以及一个用于诊断出错原因的自我反思模型。评估器的

Cannot-Self-Correct 测试了大型语言模型（LLM）在没有任何关于正确性的外部信号的情况下，能够自行修改其推理答案的强有力主张。在三个基准测试中

在Self-Refine中，单个冻结的LLM在仅提示的循环中充当生成器、批评者和重写者，论文报告了平均约20个点的提升

这篇三篇论文的系列文章探讨了大型语言模型是否能可靠地自我纠正推理过程。Self-Refine提出了一种朴素的内在反馈循环，并报告了令人印象深刻的成果

Self-Critique Loops for Agents: Where the 3rd Iteration Stops Helping

情绪概念及其在大型语言模型中的功能“大型语言模型（LLM）有时似乎会表现出情绪反应。我们正在调查原因

情绪概念及其在大型语言模型中的作用 \ Anthropic "所有现代语言模型有时都表现得好像有情绪。它们可能会说它们

相关实体

相关话题