ArenaHard
PulseAugur coverage of ArenaHard — every cluster mentioning ArenaHard across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
QUBRIC framework co-designs queries and rubrics for advanced RL
Researchers have introduced QUBRIC, a new framework designed to improve reinforcement learning (RL) by co-designing both queries and rubrics. This approach addresses a bottleneck where rubric quality is limited by fixed…
-
IBM's new 8B Granite 4.1 model outperforms older 32B MoE version
IBM has released Granite 4.1, a family of open-source language models designed for enterprise use, featuring three sizes (3B, 8B, and 30B parameters). Notably, the 8B dense model demonstrates performance matching or exc…
-
New DPO methods enhance LLM alignment with adaptive techniques
Researchers have developed several advancements to Direct Preference Optimization (DPO), a method for aligning large language models (LLMs) with human preferences. AdaDPO introduces self-adaptive coefficients to balance…