A new chart visualizes the performance history of major AI models, tracking their capabilities over time rather than just their latest release. This tool aims to expose hidden trends like performance degradation or "nerfs" that can occur after a model's initial launch. The data is sourced daily from the LMSYS Arena Leaderboard, which uses crowdsourced human evaluations to provide a robust measure of model performance. AI
影响 Provides a tool for operators to track model degradation and understand performance nuances beyond initial release benchmarks.
排序理由 The cluster describes a new visualization tool for tracking AI model performance over time, based on existing benchmark data. [lever_c_demoted from research: ic=1 ai=1.0]
在 Hacker News — AI stories ≥50 points 阅读 →
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →