Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.CL English(EN) · 4d · [2 sources]

Metadata Predictability Is Not Evidence Dependence: An Intervention-Based Audit for Weak-Label Benchmarks

Researchers have developed a new auditing protocol for weak-label benchmarks in natural language processing. This protocol distinguishes between outputs predictable from metadata alone and those genuinely dependent on the provided evidence. By combining a metadata prior dominance score with an evidence intervention statistic, the method aims to provide a more robust evaluation of benchmark reliability. AI

IMPACT Introduces a more rigorous method for evaluating NLP benchmarks, potentially improving the reliability of AI model performance assessments.
- HotpotQA
- FEVER
- SNLI
RESEARCH · arXiv cs.CL English(EN) · 1w · [3 sources]

CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution

Researchers have developed CANTANTE, a new framework designed to optimize the configuration of large language model-based multi-agent systems. This system addresses the challenge of assigning credit for performance when only system-level scores are available, by decomposing rewards into per-agent update signals. CANTANTE was evaluated on programming, mathematical reasoning, and question-answering tasks, where it demonstrated superior performance compared to existing methods and unoptimized prompts, while also incurring lower inference costs. AI

IMPACT Introduces a novel method for optimizing multi-agent LLM systems, potentially improving performance and efficiency in complex tasks.
- LLM
- MBPP
- GSM8K
- HotpotQA
- MIPROv2

Brief

Metadata Predictability Is Not Evidence Dependence: An Intervention-Based Audit for Weak-Label Benchmarks

CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution