Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging
A new research paper introduces the "Sparsity Curse" phenomenon, which describes how Reinforcement Learning with Verifiable Reward (RLVR) models, despite their advanced reasoning capabilities, become difficult to merge due to sparse and spread-out parameter updates. Unlike Supervised Fine-Tuning (SFT) models that merge easily, RLVR models exhibit fragile, near-orthogonal parameter updates that degrade performance when combined using standard methods. To address this, the researchers propose SAR-Merging, a novel technique that uses Fisher Information and magnitude-aware sparsification to preserve the unique reasoning pathways of RLVR models, demonstrating improved performance on mathematical and coding benchmarks. AI
IMPACT This research could lead to more effective methods for combining specialized AI models, potentially accelerating the development of more capable and versatile AI systems.