PulseAugur
实时 04:12:47

New research links optimizer choice to reduced forgetting in LLM finetuning

Researchers have explored the impact of optimizer consistency during the fine-tuning of large language models. One study suggests that using the same optimizer for both pre-training and fine-tuning leads to less knowledge forgetting and better performance on new tasks, a phenomenon termed 'optimizer-model consistency.' This approach may offer a better learning-forgetting tradeoff compared to other methods like LoRA. Another paper introduces 'spectral edge analysis' to study phase transitions in neural network training, linking phenomena like grokking and capability gains to the spectral gap of parameter update matrices. This framework suggests that the choice of optimizer can influence these dynamics, with experimental results confirming predictions across various model sizes. AI

影响 These studies offer new theoretical frameworks and empirical evidence for understanding and improving the training and fine-tuning of large language models, potentially leading to more efficient and effective model development.

排序理由 Two academic papers published on arXiv detailing new findings in neural network training dynamics and optimization.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

New research links optimizer choice to reduced forgetting in LLM finetuning

报道来源 [3]

  1. arXiv cs.LG TIER_1 English(EN) · Yuxing Liu, Jianyu Wang, Tong Zhang ·

    Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

    arXiv:2605.06654v1 Announce Type: new Abstract: Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves…

  2. arXiv cs.LG TIER_1 English(EN) · Yongzhong Xu ·

    Spectral Edge Dynamics: An Analytical-Empirical Study of Phase Transitions in Neural Network Training

    arXiv:2603.28964v3 Announce Type: replace Abstract: We develop the spectral edge analysis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In th…

  3. arXiv cs.AI TIER_1 English(EN) · Tong Zhang ·

    Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

    Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., fo…