A recent battle test of six April-released Large Language Models (LLMs) revealed that the Qwen 3.6 Plus, released 22 days prior, outperformed the newer DeepSeek V4 Pro. Despite DeepSeek V4 Pro's advanced reasoning architecture and top scores on AIME and SWE-bench, it achieved 89 points in the test, while Qwen 3.6 Plus scored 92. The test also highlighted a significant cost disparity, with DeepSeek's Flash variant being 13 times cheaper than its Pro version, though also scoring lower. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Qwen 3.6 Plus's superior performance and cost-effectiveness over newer models like DeepSeek V4 Pro suggest a shift in optimal production LLM choices.
RANK_REASON The cluster reports on comparative benchmark results for multiple LLMs, which falls under research.