PulseAugur
LIVE 09:52:04
research · [3 sources] ·
0
research

New research explains why Zeroth-Order Optimization scales to LLMs

Two new papers explore zeroth-order (ZO) optimization for fine-tuning large language models (LLMs). The first paper introduces a kernel perspective, showing that the approximation error depends on output size rather than parameter dimension, theoretically justifying ZO methods' scalability. The second paper investigates adaptive ZO optimizers, proposing MEAZO, a memory-efficient method that matches performance with reduced memory overhead. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT These theoretical advancements could enable more efficient and scalable fine-tuning of large language models.

RANK_REASON Two arXiv papers present novel theoretical and algorithmic contributions to zeroth-order optimization for LLM fine-tuning.

Read on arXiv cs.LG →

COVERAGE [3]

  1. arXiv cs.LG TIER_1 · Zhe Li, Bicheng Ying, Zidong Liu, Haibo Yang ·

    Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective

    arXiv:2605.03373v1 Announce Type: new Abstract: Classical optimization theory establishes that zeroth-order (ZO) algorithms suffer from a dimension-dependent slowdown, with convergence rates typically scaling with the model dimension compared to first-order methods. However, in c…

  2. arXiv cs.LG TIER_1 · Hassan Dbouk, Nidham Gazagnadou, Matthias Reisser, Christos Louizos ·

    On Adaptivity in Zeroth-Order Optimization

    arXiv:2605.03869v1 Announce Type: new Abstract: We investigate the effectiveness of adaptive zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models (LLMs). Contrary to prior claims, we show that adaptive ZO methods such as ZO-Adam offer no conv…

  3. arXiv cs.LG TIER_1 · Christos Louizos ·

    On Adaptivity in Zeroth-Order Optimization

    We investigate the effectiveness of adaptive zeroth-order (ZO) optimization for memory-constrained fine-tuning of large language models (LLMs). Contrary to prior claims, we show that adaptive ZO methods such as ZO-Adam offer no convergence advantage over well-tuned ZO-SGD, while …