A recent benchmark test evaluated 148 models on agent coding tasks, with two models, Qwen3 Coder 30B A3B and the original DeepSeek Chat, achieving a 90% success rate. The Qwen3 Coder model completed the tasks in 28 seconds for $0.0004, while DeepSeek Chat took 59 seconds for $0.0018. Liquid's LFM 2 24B A2B stood out as the most cost-effective, scoring 85% for a mere $0.0002 across ten tasks. AI
IMPACT Highlights significant cost-efficiency gains in coding agent models, potentially lowering barriers for complex AI task implementation.
RANK_REASON This is a benchmark evaluation of multiple AI models on a specific task, not a release of a new frontier model or a significant industry event. [lever_c_demoted from research: ic=1 ai=1.0]
- Aion 1.0
- Baidu Ernie 4.5 300B
- Claude Opus 4
- Cydonia 24B V4.1
- DeepSeek Chat
- LFM 2 24B A2B
- MiniMax M2 Her
- Mistral Small 3.2
- OpenRouter
- Qwen3 14B
- Qwen3.7 Max
- Qwen3 Coder 30B A3B
- TheDrummer
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →