Sakana AI model outperforms Claude Opus and GPT-5.5 on SWE-Bench Pro

By PulseAugur Editorial · [1 sources] · 2026-06-24 04:51

Sakana, a Tokyo-based lab, has developed an AI model capable of commanding GPT-5.5, achieving a score of 73.7 on the SWE-Bench Pro benchmark. This performance surpasses that of Anthropic's Claude Opus 4.8, which scored 69.2, and OpenAI's GPT-5.5, which achieved 58.6 on the same test. The development highlights advancements in AI agent capabilities and benchmark performance. AI

IMPACT This development sets a new benchmark for AI agent performance in coding tasks, potentially influencing future model development and evaluation.

RANK_REASON The item reports on a new benchmark score for an AI model, which is a research milestone. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Towards AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Sakana AI model outperforms Claude Opus and GPT-5.5 on SWE-Bench Pro

COVERAGE [1]

Towards AI TIER_1 English(EN) · Chew Loong Nian - AI ENGINEER · 2026-06-24 04:51

Sakana Trained One AI to Command GPT-5.5,

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://pub.towardsai.net/sakana-trained-one-ai-to-command-gpt-5-5-ed3725ba9187?source=rss----98111c9905da---4"><img src="https://cdn-images-1.medium.com/max/1200/1*5StdTHu9BxnSeFUGaBrPEw.png" width="1200" /></a>…

COVERAGE [1]

Sakana Trained One AI to Command GPT-5.5,

RELATED ENTITIES

RELATED TOPICS