PulseAugur
EN
LIVE 04:18:36

SWE-rebench leaderboard adds Claude Opus 4.8, GLM-5.2, Gemini 3.5 Flash

The SWE-rebench leaderboard has been updated with new models and improved UI, making it easier to compare AI performance on coding tasks. Notable additions include Claude Opus 4.8 xhigh, GLM-5.2, and Gemini 3.5 Flash, alongside several Qwen and DeepSeek models. The update also highlights results for local and self-hosted models, encouraging community input on which models to test next. AI

IMPACT Provides updated benchmarks for coding agents, influencing model selection for development tasks.

RANK_REASON Leaderboard update with new model results and UI improvements. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

SWE-rebench leaderboard adds Claude Opus 4.8, GLM-5.2, Gemini 3.5 Flash

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Fabulous_Pollution10 ·

    SWE-rebench leaderboard update: GLM-5.2, Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 31B and more + improved UI

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1uknx14/swerebench_leaderboard_update_glm52_qwen3627b/"> <img alt="SWE-rebench leaderboard update: GLM-5.2, Qwen3.6-27B, Qwen3.6-35B-A3B, Gemma 4 31B and more + improved UI" src="https://external-preview.redd.…