PulseAugur / Brief
EN
LIVE 10:34:09

Brief

last 24h
[1/1] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Do current LLMs know when to say "I don't know"? AbstentionBench (NeurIPS '25) tests 20 frontier models across 20 unanswerable-question datasets. Reasoning fine

    Two new papers evaluate the metacognitive abilities of large language models, specifically their capacity for planning and abstention. The TRIAGE paper found that most frontier and open-source LLMs perform poorly when tasked with planning problem-solving sequences and allocating token budgets without feedback, with reasoning-trained models underperforming standard ones. AbstentionBench revealed that current LLMs struggle to recognize unanswerable questions, and that reasoning fine-tuning can degrade their ability to abstain, as reinforcement learning methods lack a direct gradient for 'I don't know'. AI

    IMPACT Reveals significant limitations in current LLMs' planning and self-awareness, impacting agentic system development and reliability.