Two recent papers present conflicting findings on whether large language models can effectively abstain from answering and if chain-of-thought prompting aids this capability. One study from COLING 2025 suggests that prompted chain-of-thought increases abstention in instruction-tuned models. Conversely, the AbstentionBench paper from NeurIPS 2025 indicates that expanding the reasoning budget reduces abstention in models trained for reasoning. AI
IMPACT Conflicting research on LLM abstention highlights ongoing challenges in model control and reliability.
RANK_REASON The cluster discusses findings from two academic papers presented at conferences, focusing on LLM capabilities. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Mastodon — fosstodon.org →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →