PulseAugur
EN
LIVE 02:21:21

Pure code script outperforms LLMs on ARC-AGI-3 benchmark

A programmer has demonstrated that a simple Python script, running on a decade-old AMD CPU, can achieve a 4.76% score on the new ARC-AGI-3 benchmark. This feat highlights the inefficiency of current large language models, which struggle with the benchmark's dynamic, instruction-less environments and often achieve zero scores. The script utilizes basic computer vision techniques like centroid detection to solve spatial puzzles, outperforming many AI models despite its minimal resource requirements and lack of AI tokens. AI

IMPACT Demonstrates that traditional programming can outperform current LLMs on specific benchmarks, highlighting LLM inefficiency.

RANK_REASON The cluster describes a novel approach to a benchmark, demonstrating a non-AI method's performance against AI models.

Read on r/MachineLearning →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Pure code script outperforms LLMs on ARC-AGI-3 benchmark

COVERAGE [2]

  1. r/MachineLearning TIER_1 English(EN) · /u/-SLOW-MO-JOHN-D ·

    Scrap the LLMs. Scoring 4.76% on the brand new ARC-3 using pure code, a 2012 AMD CPU, and zero AI tokens.[P]

    <table> <tr><td> <a href="https://www.reddit.com/r/MachineLearning/comments/1tx6g3i/scrap_the_llms_scoring_476_on_the_brand_new_arc3/"> <img alt="Scrap the LLMs. Scoring 4.76% on the brand new ARC-3 using pure code, a 2012 AMD CPU, and zero AI tokens.[P]" src="https://preview.red…

  2. r/Anthropic TIER_1 English(EN) · /u/-SLOW-MO-JOHN-D ·

    Scrap the LLMs. Scoring 4.76% on the brand new ARC-3 using pure code, a 2012 AMD CPU, and zero AI tokens.[P]

    <table> <tr><td> <a href="https://www.reddit.com/r/Anthropic/comments/1tx6hd2/scrap_the_llms_scoring_476_on_the_brand_new_arc3/"> <img alt="Scrap the LLMs. Scoring 4.76% on the brand new ARC-3 using pure code, a 2012 AMD CPU, and zero AI tokens.[P]" src="https://preview.redd.it/x…