Mercor has launched the APEX-Agents leaderboard on Hugging Face to evaluate open-source models. This benchmark assesses the capability of models to perform tasks typically handled by professionals such as consultants, lawyers, and bankers. The leaderboard aims to track progress and performance in these complex, real-world applications. AI
IMPACT Provides a new benchmark for evaluating agentic capabilities of open-source models in professional domains.
RANK_REASON Launch of a new benchmark dataset and leaderboard for evaluating open-source models.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →