PulseAugur
EN
LIVE 09:13:50

Low-cost AI model beats top performers on coding benchmark with new context engine

A new method called Xanther Context Engine (XCE) has enabled the MiniMax M2.5 model to achieve a 78.2% score on the SWE-bench Verified benchmark, outperforming all other models. This achievement is notable because MiniMax M2.5 is a low-cost model, costing only $0.02 per call, and the performance gains are attributed to improved contextual understanding rather than a more powerful underlying model. The XCE provides AI coding agents with architectural context, significantly enhancing their ability to fix bugs in complex codebases. AI

IMPACT Enhances AI coding agent performance on complex tasks by providing architectural context, potentially lowering costs for software development.

RANK_REASON The cluster describes a new method and benchmark results for AI coding agents, not a release from a frontier lab. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Low-cost AI model beats top performers on coding benchmark with new context engine

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Hoyin kyoma ·

    How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

    <h2> TL;DR </h2> <p>We added architectural context to AI coding agents via MCP and tested on SWE-bench Verified (500 real bugs). MiniMax M2.5 — a model that costs $0.02 per call — scored 78.2%, surpassing every model on the official mini-SWE-agent leaderboard, including Claude Op…