Fable 5 below even Gemini 3.1 on Livebench
A new benchmark evaluation on LiveBench shows Fable 5 performing below Gemini 3.1. The results raise questions about the benchmark's accuracy or Anthropic's evaluation methodology. This performance dip for Fable 5, a model from Anthropic, is notable given its expected capabilities. AI
IMPACT Raises questions about model performance and benchmark validity, potentially influencing future model development and evaluation strategies.