AI Models Achieve 90% on Agent Coding Benchmark, Some at Fraction of Cost

By PulseAugur Editorial · [1 sources] · 2026-05-26 09:46

A recent benchmark test evaluated 148 models on agent coding tasks, with two models, Qwen3 Coder 30B A3B and the original DeepSeek Chat, achieving a 90% success rate. The Qwen3 Coder model completed the tasks in 28 seconds for $0.0004, while DeepSeek Chat took 59 seconds for $0.0018. Liquid's LFM 2 24B A2B stood out as the most cost-effective, scoring 85% for a mere $0.0002 across ten tasks. AI

IMPACT Highlights significant cost-efficiency gains in coding agent models, potentially lowering barriers for complex AI task implementation.

RANK_REASON This is a benchmark evaluation of multiple AI models on a specific task, not a release of a new frontier model or a significant industry event. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · Vilius · 2026-05-26 09:46

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

By Vilius Vystartas | May 2026 Ten more models through the same 10 agent coding tasks. Two tied the all-time record. One cost $0.0002. The other hit the score at $0.0018 — cheaper than most models scoring 70%. Batch 10 was the cheapest one yet. <h2> …

COVERAGE [1]

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

RELATED ENTITIES

RELATED TOPICS