New method rejuvenates LLM plasticity for better RL after SFT

By PulseAugur Editorial · [1 sources] · 2026-06-10 04:00

Researchers have identified a phenomenon called "model plasticity loss" that hinders the effectiveness of Reinforcement Learning (RL) after Supervised Fine-Tuning (SFT) for large language models. Excessive SFT can lead to over-confident token distributions and difficult optimization landscapes, limiting RL's ability to further enhance model capabilities. To address this, a new method called "Rejuvenation" has been proposed, which uses base-anchored model fusion and targeted neuron resets to restore plasticity while retaining SFT benefits, showing improved performance on reasoning and agentic tasks. AI

IMPACT Addresses a key limitation in LLM training pipelines, potentially improving model performance on complex tasks.

RANK_REASON Academic paper proposing a new method for LLM training. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Runze Liu, Jiashun Liu, Xu Wan, Yuqian Fu, Ling Pan · 2026-06-10 04:00

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

arXiv:2606.09932v1 Announce Type: cross Abstract: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become a standard pipeline for Large Language Model (LLM) post-training. SFT is expected to provide a useful behavioral prior for RL to further enhance model…

COVERAGE [1]

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

RELATED ENTITIES

RELATED TOPICS