A recent battle test of six April-released Large Language Models (LLMs) revealed that the Qwen 3.6 Plus, released 22 days prior, outperformed the newer DeepSeek V4 Pro. Despite DeepSeek V4 Pro's advanced reasoning architecture and top scores on AIME and SWE-bench, it achieved 89 points in the test, while Qwen 3.6 Plus scored 92. The test also highlighted a significant cost disparity, with DeepSeek's Flash variant being 13 times cheaper than its Pro version, though also scoring lower. AI
IMPACT Qwen 3.6 Plus's superior performance and cost-effectiveness over newer models like DeepSeek V4 Pro suggest a shift in optimal production LLM choices.
RANK_REASON The cluster reports on comparative benchmark results for multiple LLMs, which falls under research.
Read on Mastodon — fosstodon.org →
- Claude Opus 4.6
- Claude Sonnet
- DeepSeek V4 Pro
- Gemini 3 Flash Preview
- GPT-5.5
- Kimi K2.6
- LLM
- OpenRouter
- Qwen 3.6 Plus
- SWE-bench
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →