Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 1w · [2 sources]

QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards

Researchers have introduced QUBRIC, a new framework designed to improve reinforcement learning (RL) by co-designing both queries and rubrics. This approach addresses a bottleneck where rubric quality is limited by fixed query structures. QUBRIC rewrites open-ended queries into evaluable questions and generates rubrics based on teacher-policy gaps, retaining informative pairs for training. The framework demonstrated a 5.5-point gain on the ArenaHard benchmark and showed significant improvements on legal, moral, and narrative reasoning tasks. AI

IMPACT Enhances reinforcement learning capabilities for complex reasoning tasks beyond verifiable rewards.

reinforcement learning
ArenaHard
QUBRIC