Researchers have developed a method to distinguish between genuine reasoning steps and superficial ones in large language models' chain-of-thought (CoT) outputs. This True Thinking Score (TTS) reveals that LLMs often generate reasoning steps that do not causally contribute to the final answer, with only a small percentage of steps being truly influential. The study also found that these 'aha moments' or self-verification steps can be decorative, and that models can be guided to internally follow the identified true reasoning path. AI
影响 Challenges the trustworthiness of LLM reasoning and highlights potential inefficiencies in CoT generation.
排序理由 Academic paper introducing a new metric and findings about LLM reasoning.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →