New research links optimizer choice to reduced forgetting in LLM finetuning

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 3 sources

Researchers have explored the impact of optimizer consistency during the fine-tuning of large language models. One study suggests that using the same optimizer for both pre-training and fine-tuning leads to less knowledge forgetting and better performance on new tasks, a phenomenon termed 'optimizer-model consistency.' This approach may offer a better learning-forgetting tradeoff compared to other methods like LoRA. Another paper introduces 'spectral edge analysis' to study phase transitions in neural network training, linking phenomena like grokking and capability gains to the spectral gap of parameter update matrices. This framework suggests that the choice of optimizer can influence these dynamics, with experimental results confirming predictions across various model sizes. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT These studies offer new theoretical frameworks and empirical evidence for understanding and improving the training and fine-tuning of large language models, potentially leading to more efficient and effective model development.

RANK_REASON Two academic papers published on arXiv detailing new findings in neural network training dynamics and optimization.

Read on arXiv cs.LG →

paper
other

COVERAGE [3]

arXiv cs.LG TIER_1 · Yuxing Liu, Jianyu Wang, Tong Zhang · 2026-05-08 04:00

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

arXiv:2605.06654v1 Announce Type: new Abstract: Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves…
arXiv cs.LG TIER_1 · Yongzhong Xu · 2026-05-08 04:00

Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

arXiv:2603.28964v3 Announce Type: replace Abstract: We develop the spectral edge analysis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In th…
arXiv cs.AI TIER_1 · Tong Zhang · 2026-05-07 17:57

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., fo…

COVERAGE [3]

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

RELATED ENTITIES

RELATED TOPICS