Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Hugging Face Blog English(EN) · 1w · [3 sources]

The Open Agent Leaderboard

Hugging Face has launched the Open Agent Leaderboard, a new framework for evaluating the performance and cost of AI agent systems. This benchmark focuses on assessing an agent's generality across diverse tasks and settings, rather than just the underlying model's capabilities. The leaderboard utilizes six established benchmarks, including SWE-Bench Verified and AppWorld, to test agents in areas like coding, customer service, and research, providing a more holistic view of their real-world applicability. AI

IMPACT Provides a new standardized method for evaluating AI agent generality and cost, potentially guiding development towards more practical applications.
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 1w · [4 sources]

Argus: Evidence Assembly for Scalable Deep Research Agents

Researchers have developed Argus, a novel agentic system designed to improve deep research capabilities by treating evidence gathering as assembling a jigsaw puzzle. Unlike parallel search methods that often duplicate information, Argus employs a Searcher and Navigator duo. The Searcher collects evidence traces, while the Navigator manages an evidence graph, identifies missing pieces, and synthesizes the final answer. This approach significantly boosts performance on benchmarks, with 64 Searchers achieving 86.2 on BrowseComp, outperforming proprietary agents while maintaining a manageable context window. AI

IMPACT Argus demonstrates a novel approach to evidence assembly for AI agents, potentially improving efficiency and performance on complex research tasks.

Brief

The Open Agent Leaderboard

Argus: Evidence Assembly for Scalable Deep Research Agents