New research links optimizer choice to reduced forgetting in LLM finetuning

By PulseAugur Editorial · [3 sources] · 2026-05-07 17:57

Researchers have explored the impact of optimizer consistency during the fine-tuning of large language models. One study suggests that using the same optimizer for both pre-training and fine-tuning leads to less knowledge forgetting and better performance on new tasks, a phenomenon termed 'optimizer-model consistency.' This approach may offer a better learning-forgetting tradeoff compared to other methods like LoRA. Another paper introduces 'spectral edge analysis' to study phase transitions in neural network training, linking phenomena like grokking and capability gains to the spectral gap of parameter update matrices. This framework suggests that the choice of optimizer can influence these dynamics, with experimental results confirming predictions across various model sizes. AI

IMPACT These studies offer new theoretical frameworks and empirical evidence for understanding and improving the training and fine-tuning of large language models, potentially leading to more efficient and effective model development.

RANK_REASON Two academic papers published on arXiv detailing new findings in neural network training dynamics and optimization.

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New research links optimizer choice to reduced forgetting in LLM finetuning

COVERAGE [3]

arXiv cs.LG TIER_1 English(EN) · Yuxing Liu, Jianyu Wang, Tong Zhang · 2026-05-08 04:00

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

arXiv:2605.06654v1 Announce Type: new Abstract: Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves…
arXiv cs.LG TIER_1 English(EN) · Yongzhong Xu · 2026-05-08 04:00

Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

arXiv:2603.28964v3 Announce Type: replace Abstract: We develop the spectral edge analysis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In th…
arXiv cs.AI TIER_1 English(EN) · Tong Zhang · 2026-05-07 17:57

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., fo…

COVERAGE [3]

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

RELATED ENTITIES

RELATED TOPICS