Do current LLMs know when to say "I don't know"? AbstentionBench (NeurIPS '25) tests 20 frontier models across 20 unanswerable-question datasets. Reasoning fine
Two new papers evaluate the metacognitive abilities of large language models, specifically their capacity for planning and abstention. The TRIAGE paper found that most frontier and open-source LLMs perform poorly when tasked with planning problem-solving sequences and allocating token budgets without feedback, with reasoning-trained models underperforming standard ones. AbstentionBench revealed that current LLMs struggle to recognize unanswerable questions, and that reasoning fine-tuning can degrade their ability to abstain, as reinforcement learning methods lack a direct gradient for 'I don't know'. AI
IMPACT Reveals significant limitations in current LLMs' planning and self-awareness, impacting agentic system development and reliability.