PulseAugur
实时 21:47:07

METR AI time horizons graph riddled with severe errors, analysis finds

A recent analysis by Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, has identified numerous severe errors in the widely cited METR AI time horizons graph. These flaws include fabricated human baseline data, incentivizing benchmarkers to take longer by paying them hourly, a biased sample of human testers, and potential test-training data contamination. Witkin argues that the graph's significant inaccuracies render it unreliable for drawing meaningful conclusions about AI capabilities and their impact on tasks like software development. AI

影响 Critiques of widely cited AI capability graphs highlight the need for rigorous scientific standards and can influence how AI progress is perceived.

排序理由 The cluster discusses a critique of a previously published graph, rather than a new release or research finding.

在 r/MachineLearning 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. r/MachineLearning TIER_1 English(EN) · /u/common_yarrow ·

    著名的METR AI时间跨度图包含许多严重错误[D]

    <!-- SC_OFF --><div class="md"><p>Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, <a href="https://www.transformernews.ai/p/against-the-metr-graph-coding-capabilities-software-jobs-task-ai">writes</a> damningly about the famous METR AI time horizons graph in…