PulseAugur / Brief
EN
LIVE 20:06:49

Brief

last 24h
[1/1] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness

    Researchers have analyzed the susceptibility of machine learning benchmarks to manipulation, treating datasets as voters and models as candidates. They found that strategically including benchmark data in a model's training set to achieve a top leaderboard rank is an NP-hard problem, akin to election bribery. The study introduces 'instance-level robustness' to quantify the minimum datasets needed for manipulation and evaluates this across MMLU and BIG-Bench Hard leaderboards. AI

    IMPACT Highlights potential for manipulation in ML leaderboards, urging caution in interpreting benchmark results.