English(EN) How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

低成本 AI 模型凭借新的上下文引擎在编码基准测试中击败顶级模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-09 05:53

一种名为 Xanther Context Engine (XCE) 的新方法使 MiniMax M2.5 模型在 SWE-bench Verified 基准测试中取得了 78.2% 的得分，超越了所有其他模型。这一成就之所以引人注目，是因为 MiniMax M2.5 是一个低成本模型，每次调用的成本仅为 0.02 美元，并且性能提升归功于改进的上下文理解，而非更强大的底层模型。XCE 为 AI 编码代理提供了架构上下文，显著增强了它们修复复杂代码库中 bug 的能力。 AI

影响通过提供架构上下文，增强了 AI 编码代理在复杂任务上的性能，可能降低软件开发的成本。

排序理由该集群描述了 AI 编码代理的新方法和基准测试结果，而非前沿实验室的发布。[lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Hoyin kyoma · 2026-05-09 05:53

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

<h2> TL;DR </h2> <p>We added architectural context to AI coding agents via MCP and tested on SWE-bench Verified (500 real bugs). MiniMax M2.5 — a model that costs $0.02 per call — scored 78.2%, surpassing every model on the official mini-SWE-agent leaderboard, including Claude Op…

报道来源 [1]

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

相关实体

相关话题