Claude Fable-5 achieved a leading score of 88.0% on the Terminal-Bench 2.1 benchmark, surpassing GPT-5.5. However, this model has been unavailable since June 12 due to a US export-control order. Among currently accessible tools, Codex CLI powered by GPT-5.5 leads with an 83.4% score, narrowly ahead of Claude Code using Opus 4.8 at 82.7%. The benchmark highlights that the effectiveness of coding agents is significantly influenced by their surrounding harness and tooling, not just the underlying model. AI
IMPACT Highlights the critical role of tooling and availability over raw model performance in practical AI applications.
RANK_REASON Benchmark results and analysis of AI models for coding tasks. [lever_c_demoted from research: ic=1 ai=1.0]
Read on dev.to — Claude Code tag →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →