Ranked Ninth, Second in China, Why is DeepSeek V4 Loved and Hated?
DeepSeek's V4 model has shown mixed results, ranking ninth globally and second in China according to Vals AI. While some users expressed disappointment compared to its predecessor, V3, and acknowledged gaps in areas like agentic coding and world knowledge against models like Opus 4.6 and Gemini, new testing reveals V4's strengths in understanding Chinese cultural contexts. It demonstrated deep comprehension of classical Chinese poetry and accurate citation of Chinese legal statutes without hallucination. Additionally, V4 showed nuanced understanding of internet slang and provided context-aware translations for Chinese phrases, though it did fabricate a non-existent internet meme. AI
IMPACT Highlights the importance of culturally specific benchmarks for evaluating LLMs, potentially guiding future model development and evaluation strategies.