English(EN) Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

Pro-KLShampoo优化器通过谱结构分析改进LLM预训练

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 04:00

研究人员开发了一种名为Pro-KLShampoo的优化技术，该技术结合了梯度预处理和正交化，以实现更高效的LLM预训练。该方法利用了在KL-Shampoo预处理器中观察到的尖峰和平坦特征值谱，通过将谱结构限制在跟踪的子空间内，并将正交化应用于剩余方向。在包括GPT-2和LLaMA模型在内的多个预训练规模上，Pro-KLShampoo在验证损失、内存使用和训练时间方面均优于标准的KL-Shampoo。 AI

影响引入了一种更有效的优化方法，可以降低LLM预训练的计算成本。

排序理由介绍LLM预训练新颖优化技术的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Ruotong Sun, Ermin Wei · 2026-05-08 04:00

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

arXiv:2605.06316v1 Announce Type: new Abstract: Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the precondition…

报道来源 [1]

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

相关实体

相关话题