New adaptive batch sizing method cuts training steps by up to 66%

By PulseAugur Editorial · [1 sources] · 2026-07-03 04:00

Researchers have developed a new method for adaptive batch sizing in machine learning that accounts for the non-Euclidean geometry of optimizers like signSGD and spectral descent. This approach, which estimates non-Euclidean gradient noise scales using local mini-batch gradients, can significantly reduce training steps. Experiments showed up to a 66% reduction in training steps for a 160 million parameter Llama model using signSGD and spectral descent, while matching the validation loss of constant-batch baselines. AI

IMPACT This method could lead to more efficient training of large language models, reducing computational costs and time.

RANK_REASON The cluster contains a research paper detailing a new method for optimizing machine learning training. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New adaptive batch sizing method cuts training steps by up to 66%

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Hiroki Naganuma, Shagun Gupta, Youssef Briki, Ioannis Mitliagkas, Irina Rish, Parameswaran Raman, Hao-Jun Michael Shi · 2026-07-03 04:00

Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

arXiv:2602.03001v2 Announce Type: replace-cross Abstract: To maximize hardware utilization, modern machine learning systems typically employ large constant or manually tuned batch size schedules, relying on heuristics that are brittle and costly to tune. Existing adaptive strateg…

COVERAGE [1]

Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

RELATED ENTITIES

RELATED TOPICS