PulseAugur
EN
LIVE 23:09:08

LLMs struggle with reliable self-correction without external feedback

Recent research indicates that large language models struggle with reliable self-correction, particularly when attempting to revise their own reasoning without external feedback. Studies on approaches like Self-Refine and Cannot-Self-Correct show that a model's initial confidence often carries over into revisions, potentially degrading performance. While methods like Reflexion offer a partial solution by gating self-correction with an external success/failure signal, they are not foolproof and can still lead to errors if the signal is unreliable. The effectiveness of self-correction also diminishes rapidly after one or two iterations, with later passes potentially introducing new errors or over-editing correct responses. AI

IMPACT Self-correction loops in LLMs are less effective than previously thought, especially without external validation, limiting their utility in autonomous agents.

RANK_REASON Cluster consists of multiple research papers and blog posts discussing the limitations of LLM self-correction mechanisms.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 7 sources. How we write summaries →

LLMs struggle with reliable self-correction without external feedback

COVERAGE [7]

  1. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that diagnoses what went wrong. The Evaluator's

    Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that diagnoses what went wrong. The Evaluator's external signal — heuristic, exact-match, or test execution — gates whether diagnosis fires. When that signal misfires, …

  2. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Cannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal about correctness. Across three benchmar

    Cannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal about correctness. Across three benchmarks (GSM8K, CommonSenseQA, HotPotQA), the answer is no: the model's confidence carries over from the initial answer into …

  3. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    In Self-Refine, a single frozen LLM acts as generator, critic, and rewriter in a prompt-only loop, and the paper reports about 20 points of average lift across

    In Self-Refine, a single frozen LLM acts as generator, critic, and rewriter in a prompt-only loop, and the paper reports about 20 points of average lift across seven tasks without any training, RL, or external signal. The gains vary widely by task: small on math reasoning, but la…

  4. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    This is a 3-paper arc on whether LLMs can reliably self-correct their own reasoning. Self-Refine proposes a naive intrinsic-feedback loop and reports impressive

    This is a 3-paper arc on whether LLMs can reliably self-correct their own reasoning. Self-Refine proposes a naive intrinsic-feedback loop and reports impressive gains. Cannot-Self-Correct refutes empirically the class of approach Self-Refine belongs to. Reflexion threads the need…

  5. dev.to — LLM tag TIER_1 English(EN) · Gabriel Anhaia ·

    Self-Critique Loops for Agents: Where the 3rd Iteration Stops Helping

    <ul> <li> <strong>Book:</strong> <a href="https://www.amazon.com/dp/B0GYJZ2XJD" rel="noopener noreferrer">AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs</a> </li> <li> <strong>Also by me:</strong> <em>Thinking in Go</em> (2-book series) — <a href="http…

  6. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    Emotion Concepts and their Function in a Large Language Model "Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why

    Emotion Concepts and their Function in a Large Language Model "Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal represe…

  7. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    Emotion concepts and their function in a large language model \ Anthropic "All modern language models sometimes act like they have emotions. They may say they’r

    Emotion concepts and their function in a large language model \ Anthropic "All modern language models sometimes act like they have emotions. They may say they’re happy to help you, or sorry when they make a mistake. Sometimes they even appear to become frustrated or anxious when …