Low-cost AI model beats top performers on coding benchmark with new context engine

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

A new method called Xanther Context Engine (XCE) has enabled the MiniMax M2.5 model to achieve a 78.2% score on the SWE-bench Verified benchmark, outperforming all other models. This achievement is notable because MiniMax M2.5 is a low-cost model, costing only $0.02 per call, and the performance gains are attributed to improved contextual understanding rather than a more powerful underlying model. The XCE provides AI coding agents with architectural context, significantly enhancing their ability to fix bugs in complex codebases. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enhances AI coding agent performance on complex tasks by providing architectural context, potentially lowering costs for software development.

RANK_REASON The cluster describes a new method and benchmark results for AI coding agents, not a release from a frontier lab. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

Low-cost AI model beats top performers on coding benchmark with new context engine

COVERAGE [1]

dev.to — LLM tag TIER_1 · Hoyin kyoma · 2026-05-09 05:53

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

<h2> TL;DR </h2> <p>We added architectural context to AI coding agents via MCP and tested on SWE-bench Verified (500 real bugs). MiniMax M2.5 — a model that costs $0.02 per call — scored 78.2%, surpassing every model on the official mini-SWE-agent leaderboard, including Claude Op…

COVERAGE [1]

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

RELATED ENTITIES

RELATED TOPICS