PulseAugur
实时 23:21:37

New research explains why Zeroth-Order Optimization scales to LLMs

Two new papers explore zeroth-order (ZO) optimization for fine-tuning large language models (LLMs). The first paper introduces a kernel perspective, showing that the approximation error depends on output size rather than parameter dimension, theoretically justifying ZO methods' scalability. The second paper investigates adaptive ZO optimizers, proposing MEAZO, a memory-efficient method that matches performance with reduced memory overhead. AI

影响 These theoretical advancements could enable more efficient and scalable fine-tuning of large language models.

排序理由 Two arXiv papers present novel theoretical and algorithmic contributions to zeroth-order optimization for LLM fine-tuning.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

New research explains why Zeroth-Order Optimization scales to LLMs

报道来源 [3]

  1. arXiv cs.LG TIER_1 English(EN) · Zhe Li, Bicheng Ying, Zidong Liu, Haibo Yang ·

    Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective

    arXiv:2605.03373v1 Announce Type: new Abstract: Classical optimization theory establishes that zeroth-order (ZO) algorithms suffer from a dimension-dependent slowdown, with convergence rates typically scaling with the model dimension compared to first-order methods. However, in c…

  2. arXiv cs.LG TIER_1 English(EN) · Hassan Dbouk, Nidham Gazagnadou, Matthias Reisser, Christos Louizos ·

    On Adaptivity in Zeroth-Order Optimization

    arXiv:2605.03869v1 Announce Type: new Abstract: We investigate the effectiveness of adaptive zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models (LLMs). Contrary to prior claims, we show that adaptive ZO methods such as ZO-Adam offer no conv…

  3. arXiv cs.LG TIER_1 English(EN) · Christos Louizos ·

    On Adaptivity in Zeroth-Order Optimization

    We investigate the effectiveness of adaptive zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models (LLMs). Contrary to prior claims, we show that adaptive ZO methods such as ZO-Adam offer no convergence advantage over well-tuned ZO-SGD, while …