Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 8h

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

A new research paper introduces the "Sparsity Curse" phenomenon, which describes how Reinforcement Learning with Verifiable Reward (RLVR) models, despite their advanced reasoning capabilities, become difficult to merge due to sparse and spread-out parameter updates. Unlike Supervised Fine-Tuning (SFT) models that merge easily, RLVR models exhibit fragile, near-orthogonal parameter updates that degrade performance when combined using standard methods. To address this, the researchers propose SAR-Merging, a novel technique that uses Fisher Information and magnitude-aware sparsification to preserve the unique reasoning pathways of RLVR models, demonstrating improved performance on mathematical and coding benchmarks. AI

IMPACT This research could lead to more effective methods for combining specialized AI models, potentially accelerating the development of more capable and versatile AI systems.
RESEARCH · arXiv cs.CL English(EN) · 1mo

Reward Modeling from Natural Language Human Feedback

Researchers have introduced a new method called Reward Modeling from Natural Language Human Feedback (RM-NLHF) to improve the training of Generative Reward Models (GRMs). Traditional methods using pairwise preference data can lead to GRMs learning to guess correct outcomes without genuine understanding, introducing noise into the training signal. RM-NLHF addresses this by using natural language critiques from humans to provide more accurate process reward signals, which are then used to train GRMs. The approach also includes a Meta Reward Model (MetaRM) to generalize from limited human critiques to larger datasets. AI

IMPACT Improves training signal accuracy for reward models, potentially leading to more robust and reliable AI systems.

Brief

Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging

Reward Modeling from Natural Language Human Feedback