Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 2w

SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More

The SWE-rebench leaderboard has been updated with 110 new Python tasks from GitHub PRs spanning March, April, and May. This update focuses on evaluating models' ability to read real issues, edit code, and pass test suites. Future updates will include more models like Gemini Flash 3.5 and DeepSeek v4 Pro, alongside multilingual tasks and options for local development. AI

IMPACT Provides updated benchmarks for AI models on software engineering tasks, influencing future development and evaluation strategies.

GPT-5.5
Kimi K2.6
DeepSeek v4 Pro
Opus 4.7
Qwen3.5-397B-A17B
Cursor (Composer 2.5)
Gemini Flash 3.5
SWE-rebench