Pro-KLShampoo optimizer improves LLM pre-training with spectral structure analysis

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 04:00

Researchers have developed Pro-KLShampoo, an optimization technique that combines gradient preconditioning with orthogonalization for more efficient LLM pre-training. This method leverages the observed spike-and-flat eigenvalue spectra in KL-Shampoo's preconditioners by restricting spectral structure to a tracked subspace and applying orthogonalization to the remaining directions. Pro-KLShampoo demonstrated superior performance over standard KL-Shampoo in terms of validation loss, memory usage, and training time across multiple pre-training scales, including GPT-2 and LLaMA models. AI

影响 Introduces a more efficient optimization method that could reduce compute costs for LLM pre-training.

排序理由 Academic paper introducing a novel optimization technique for LLM pre-training. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Ruotong Sun, Ermin Wei · 2026-05-08 04:00

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

arXiv:2605.06316v1 Announce Type: new Abstract: Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the precondition…

报道来源 [1]

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

相关实体

相关话题