New LLMs Debut with High Scores; Mistral, DeepSeek Hit 90%

By PulseAugur Editorial · [1 sources] · 2026-05-26 18:48

A recent benchmark test of ten large language models revealed that five new model families debuted with scores of 75% or higher on coding tasks. Two models, Mistral Large 2411 and DeepSeek Chat V3-0324, achieved a record-tying 90% score. The L3 Lunaris 8B model stood out for its exceptional value, scoring 85% at an extremely low cost of $0.0001 per benchmark run. AI

IMPACT New models consistently achieve high scores on coding benchmarks, indicating rapid progress in agent capabilities and cost-efficiency.

RANK_REASON The article details benchmark results for multiple LLMs, including new families and record-breaking scores, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New LLMs Debut with High Scores; Mistral, DeepSeek Hit 90%

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · Vilius · 2026-05-26 18:48

I Tested 10 More Models. Five Brand New Families Debuted. None Scored Below 75%.

By Vilius Vystartas | May 2026 I ran another 10 models through the same agent coding benchmark. Five of them were from completely untested families — Sao10k, Anthracite, Inflection, Mancer, Undi95 — and every single one scored 75% or higher on its first try. Th…

COVERAGE [1]

I Tested 10 More Models. Five Brand New Families Debuted. None Scored Below 75%.

RELATED ENTITIES

RELATED TOPICS