QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards
Researchers have introduced QUBRIC, a new framework designed to improve reinforcement learning (RL) by co-designing both queries and rubrics. This approach addresses a bottleneck where rubric quality is limited by fixed query structures. QUBRIC rewrites open-ended queries into evaluable questions and generates rubrics based on teacher-policy gaps, retaining informative pairs for training. The framework demonstrated a 5.5-point gain on the ArenaHard benchmark and showed significant improvements on legal, moral, and narrative reasoning tasks. AI
IMPACT Enhances reinforcement learning capabilities for complex reasoning tasks beyond verifiable rewards.