The SWE-rebench leaderboard has been updated with new models and improved UI, making it easier to compare AI performance on coding tasks. Notable additions include Claude Opus 4.8 xhigh, GLM-5.2, and Gemini 3.5 Flash, alongside several Qwen and DeepSeek models. The update also highlights results for local and self-hosted models, encouraging community input on which models to test next. AI
IMPACT Provides updated benchmarks for coding agents, influencing model selection for development tasks.
RANK_REASON Leaderboard update with new model results and UI improvements. [lever_c_demoted from research: ic=1 ai=1.0]
- Claude Opus 4.8 xhigh
- DeepSeek-V4 Flash
- DeepSeek-V4 Pro
- Gemini 3.5 Flash
- Gemma 4 31B
- GLM-5.2
- MiMo V2.5 Pro
- MiniMax M3
- Qwen3.6-27B
- Qwen3.6-35B-A3B
- SWE-rebench
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →