PulseAugur
EN
LIVE 10:10:18

New adaptive batch sizing method cuts training steps by up to 66%

Researchers have developed a new method for adaptive batch sizing in machine learning that accounts for the non-Euclidean geometry of optimizers like signSGD and spectral descent. This approach, which estimates non-Euclidean gradient noise scales using local mini-batch gradients, can significantly reduce training steps. Experiments showed up to a 66% reduction in training steps for a 160 million parameter Llama model using signSGD and spectral descent, while matching the validation loss of constant-batch baselines. AI

IMPACT This method could lead to more efficient training of large language models, reducing computational costs and time.

RANK_REASON The cluster contains a research paper detailing a new method for optimizing machine learning training. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New adaptive batch sizing method cuts training steps by up to 66%

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Hiroki Naganuma, Shagun Gupta, Youssef Briki, Ioannis Mitliagkas, Irina Rish, Parameswaran Raman, Hao-Jun Michael Shi ·

    Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

    arXiv:2602.03001v2 Announce Type: replace-cross Abstract: To maximize hardware utilization, modern machine learning systems typically employ large constant or manually tuned batch size schedules, relying on heuristics that are brittle and costly to tune. Existing adaptive strateg…