Researchers have analyzed the susceptibility of machine learning benchmarks to manipulation, treating datasets as voters and models as candidates. They found that strategically including benchmark data in a model's training set to achieve a top leaderboard rank is an NP-hard problem, akin to election bribery. The study introduces 'instance-level robustness' to quantify the minimum datasets needed for manipulation and evaluates this across MMLU and BIG-Bench Hard leaderboards. AI
IMPACT Highlights potential for manipulation in ML leaderboards, urging caution in interpreting benchmark results.
RANK_REASON The cluster contains an academic paper analyzing machine learning benchmarks.
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →