New framework detects unfaithful chain-of-thought reasoning in LLMs

By PulseAugur Editorial · [1 sources] · 2026-05-26 04:00

Researchers have developed a new framework called CIE-Scorer to detect when a large language model's chain-of-thought (CoT) reasoning does not accurately reflect its internal decision-making process. This method combines external signals, like answer consistency, with internal computational evidence derived from tracing model circuits. By efficiently constructing sentence-level circuits and comparing internal and external reasoning graphs, CIE-Scorer identifies discrepancies, achieving state-of-the-art performance on CoT unfaithfulness detection while reducing computational costs. AI

IMPACT This research offers a more cost-effective way to ensure the reliability of LLM reasoning, crucial for applications requiring trustworthy outputs.

RANK_REASON The cluster contains an academic paper detailing a new method for detecting unfaithfulness in LLM reasoning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New framework detects unfaithful chain-of-thought reasoning in LLMs

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Xu Shen, Zhen Tan, Song Wang, Pingjun Hong, Rui Miao, Xin Wang, Tianlong Chen · 2026-05-26 04:00

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

arXiv:2605.25603v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning improves the problem-solving ability of large language models (LLMs), but generated reasoning traces may not faithfully reflect the model's actual decision process. Existing CoT unfaithfulness detect…

COVERAGE [1]

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

RELATED ENTITIES

RELATED TOPICS