A new research paper reveals a significant shortcut in how small language models perform arithmetic tasks using chain-of-thought (CoT) prompting. Instead of relying on logical sequencing, these models tend to copy the number positioned just before the answer delimiter, regardless of the intermediate reasoning steps. This positional copying accounts for a large portion of their accuracy, even when the preceding steps are incorrect or shuffled, highlighting a potential failure mode in evaluating CoT faithfulness. AI
IMPACT Reveals a critical flaw in evaluating arithmetic reasoning in small LLMs, suggesting current faithfulness evaluations may be misleading.
RANK_REASON The cluster contains an academic paper detailing a novel finding about the behavior of language models.
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →