Researchers have identified a phenomenon in medical large language models called Overthinking (OT), where models answer correctly in standard QA but fail in extended chain-of-thought reasoning. This failure state is linearly decodable with high accuracy, yet attempts to correct it using fixed linear steering methods proved ineffective across different architectures and domains. The study suggests that the failure signals are entangled with critical task computations, hindering direct correction but enabling improved post-generation reliability estimation. AI
影响 Identifies a specific failure mode in LLMs that hinders correction but aids reliability estimation.
排序理由 Academic paper detailing a specific failure mode in LLMs and exploring correction methods. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →