A new paper reveals a significant gap between the capabilities of AI models evaluated in academic research and the actual frontier models available at the time. The study found that the median research paper evaluates models that are approximately 10.85 ECI points behind the current state-of-the-art, a gap that is widening annually. This "publication elicitation gap" is attributed to factors beyond peer-review latency, with a substantial portion stemming from the use of older or less capable models and insufficient disclosure of evaluation configurations. AI
影响 Highlights a systemic issue in AI evaluation, potentially misinforming policy and investment by overstating current capabilities.
排序理由 This is a research paper analyzing academic evaluations of AI models.
- Claude Opus 4.5
- Claude Opus 4.7
- Claude Sonnet 3.7
- Epoch AI Capabilities Index
- GPT-4o-mini
- GPT-5.5 Pro
- VERSIO-AI
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →