How much of MLE-Bench's gains are the algorithm vs. better models + more search? [R]
A new benchmark called FML-Bench suggests that recent gains in automated machine learning research, specifically in areas like code editing agents, are not primarily due to algorithmic advancements. When controlling for factors like model capabilities and search budgets, older algorithms like AIDE perform comparably to modern systems. This indicates that much of the observed progress may be attributed to improvements in base models and shifts in problem definitions rather than fundamental algorithmic efficiency. AI
IMPACT Challenges the narrative of rapid algorithmic progress in ML, suggesting a need to re-evaluate the drivers of performance gains.