Mercor has launched the APEX-Agents leaderboard on Hugging Face to evaluate open-source models. This benchmark assesses the capability of models to perform tasks typically handled by professionals such as consultants, lawyers, and bankers. The leaderboard aims to track progress and performance in these complex, real-world applications. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a new benchmark for evaluating agentic capabilities of open-source models in professional domains.
RANK_REASON Launch of a new benchmark dataset and leaderboard for evaluating open-source models.