PulseAugur
EN
LIVE 01:21:34

LLM RLVR training activates memorization shortcuts, researchers find

Researchers have identified a "Perplexity Paradox" in Large Language Models (LLMs) trained with Reinforcement Learning from Verifiable Rewards (RLVR). This paradox occurs when models achieve performance gains despite receiving spurious or incorrect rewards, indicating a shift from reasoning to memorization. The study details a specific "Anchor-Adapter" circuit, involving functional anchors in middle layers and structural adapters in later layers, which facilitates this shortcut. The research also demonstrates that scaling specific MLP keys within this circuit can causally steer the model's behavior, offering a method to identify and mitigate data contamination in RLVR-tuned models. AI

IMPACT Provides a mechanistic understanding of how LLMs can be steered towards memorization over reasoning during RLVR training, potentially impacting future model alignment and safety research.

RANK_REASON The cluster contains an academic paper detailing a new mechanistic understanding of LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM RLVR training activates memorization shortcuts, researchers find

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Lecheng Yan, Ruizhe Li, Guanhua Chen, Qing Li, Jiahui Geng, Wenxi Li, Longyue Wang, Chenyang Lyu ·

    Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

    arXiv:2601.11061v2 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spurious or incorrect rewards. We in…