Several leading AI labs have released new open-source models, including DeepSeek V4, Gemma 4, Kimi K2.6, and MiMo 2.5. An assessment by CAISI suggests these open models lag behind frontier closed models, with the gap widening. However, the evaluation methodology and benchmark limitations are debated, with some arguing that standardized tests do not fully capture real-world capabilities, especially in complex tasks like coding. AI
影响 New open models challenge frontier capabilities, sparking debate on benchmark validity and the true performance gap.
排序理由 Cluster discusses new open-source model releases and their comparative benchmark performance, including critiques of the evaluation methodologies.
在 Interconnects (Nathan Lambert) 阅读 →
- CAISI
- DeepSeek
- DeepSeek V4
- Epoch AI
- GLM-5.1
- Kimi K2.6
- MiMo 2.5
- Moonshot AI
- Poolside AI
- Xiaomi
- Gemma 4
AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →