English(EN) Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

AI模型在Agent编码基准测试中达到90%，部分成本极低

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-26 09:46

一项最近的基准测试评估了148个模型在Agent编码任务上的表现，其中Qwen3 Coder 30B A3B和最初的DeepSeek Chat两个模型达到了90%的成功率。Qwen3 Coder模型以0.0004美元的成本在28秒内完成了任务，而DeepSeek Chat则花费了0.0018美元，耗时59秒。Liquid的LFM 2 24B A2B在十项任务中以0.0002美元的成本获得85%的得分，成为最具成本效益的模型。 AI

影响凸显了编码Agent模型在成本效益方面的显著提升，可能降低复杂AI任务实现的门槛。

排序理由这是对多个AI模型在特定任务上的基准评估，而非新前沿模型的发布或重大的行业事件。[lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · Vilius · 2026-05-26 09:46

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

By Vilius Vystartas | May 2026 Ten more models through the same 10 agent coding tasks. Two tied the all-time record. One cost $0.0002. The other hit the score at $0.0018 — cheaper than most models scoring 70%. Batch 10 was the cheapest one yet. <h2> …

报道来源 [1]

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

相关实体

相关话题