English(EN) Heads up for DeepSWE benchmark: The cost is measured per task, not the total run.

DeepSWE benchmark 成本揭晓：GPT-5.5 和 Mimo V2.5 定价详情

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-31 23:22

Reddit 的 r/singularity 版块的一位用户分享了运行 DeepSWE benchmark 的成本见解，指出定价是按任务计费，而不是按总运行成本计费。这意味着 Mimo V2.5 Pro 等模型完成一次完整 benchmark 可能花费约 225 美元，而 GPT 5.5 medium 大约花费 264 美元。该用户根据早期结果预测，Mimo V2.5（非 Pro 版）完成一次完整运行大约需要 7.15 美元。 AI

影响为使用 AI 模型进行 benchmark 的研究人员和开发人员提供了成本见解，影响工具选择和预算规划。

排序理由用户生成的 benchmark 成本分析，并非主要发布或官方评估。[lever_c_demoted from research: ic=1 ai=0.7]

在 r/singularity 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/singularity TIER_2 English(EN) · /u/pneuny · 2026-05-31 23:22

DeepSWE基准测试请注意：成本按任务计费，而非总运行次数。

<div class="md"><p>I was running the Deep SWE benchmark and saw Mimo V2.5 Pro at $1.99 and figured running Mimo V2.5 (non-pro) would be cheaper than $1.99. But actually, it's not like Artificial Analysis where it measure the total amount, you need to multiply that …

报道来源 [1]

DeepSWE基准测试请注意：成本按任务计费，而非总运行次数。

相关实体

相关话题