Brief · PulseAugur

TOOL · r/singularity English(EN) · 6h

Fable 5 below even Gemini 3.1 on Livebench

A new benchmark evaluation on LiveBench shows Fable 5 performing below Gemini 3.1. The results raise questions about the benchmark's accuracy or Anthropic's evaluation methodology. This performance dip for Fable 5, a model from Anthropic, is notable given its expected capabilities. AI

IMPACT Raises questions about model performance and benchmark validity, potentially influencing future model development and evaluation strategies.

Anthropic
LiveBench