Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods
A new research paper explores the tradeoffs between serial runtime and compute efficiency for stochastic momentum methods like Heavy Ball (HB) and Accelerated SGD (ASGD). The study proves finite-dimensional lower bounds on batch-size tradeoffs, indicating that HB does not inherently improve compute efficiency over standard SGD for arbitrary spectra. Instead, HB preserves SGD-level efficiency over a larger batch-size window, enabling reduced serial runtime. ASGD's performance is spectrum-dependent, offering improved small-batch compute efficiency for rapidly decaying spectra but trading this for serial runtime as batch size increases. AI
IMPACT This research provides theoretical insights into optimizing training efficiency for large-scale machine learning models.