PulseAugur / Brief
EN
LIVE 23:26:18

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Spent $11k evaluating Fable: capability looked SOTA, refusals killed it (before Anthropic did)

    An independent evaluator spent over $11,000 testing Anthropic's Claude Fable 5 model, expecting it to outperform GPT-5.5. However, the model exhibited a high rate of refusals, leading to timeouts and failures on 13 specific tasks within the WolfBench benchmark. This excessive refusal behavior, while intended for safety, hindered the model's performance in agentic workflows, causing it to burn tokens and fail tasks that other models like Claude Opus and GPT-5.5 could solve. AI

    IMPACT Excessive safety refusals in LLM agents can lead to token waste and task failure, hindering practical application despite strong underlying capabilities.