PulseAugur
EN
LIVE 14:52:09

AI Models Achieve 90% on Agent Coding Benchmark, Some at Fraction of Cost

A recent benchmark test evaluated 148 models on agent coding tasks, with two models, Qwen3 Coder 30B A3B and the original DeepSeek Chat, achieving a 90% success rate. The Qwen3 Coder model completed the tasks in 28 seconds for $0.0004, while DeepSeek Chat took 59 seconds for $0.0018. Liquid's LFM 2 24B A2B stood out as the most cost-effective, scoring 85% for a mere $0.0002 across ten tasks. AI

IMPACT Highlights significant cost-efficiency gains in coding agent models, potentially lowering barriers for complex AI task implementation.

RANK_REASON This is a benchmark evaluation of multiple AI models on a specific task, not a release of a new frontier model or a significant industry event. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Vilius ·

    Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

    <p><em>By Vilius Vystartas | May 2026</em></p> <p>Ten more models through the same 10 agent coding tasks. Two tied the all-time record. One cost $0.0002. The other hit the score at $0.0018 — cheaper than most models scoring 70%.</p> <p>Batch 10 was the cheapest one yet.</p> <h2> …