DASH optimizer speeds up Shampoo by up to 5.6x with GPU and root-finding innovations

By PulseAugur Editorial · [1 sources] · 2026-06-26 04:00

Researchers have developed DASH, a significantly faster implementation of the Shampoo optimizer for machine learning. DASH utilizes batched block preconditioning to improve GPU utilization and introduces novel methods like Newton-DB and Chebyshev polynomial approximations for computing inverse matrix roots. This optimization results in up to a 5.6x speedup in optimizer steps compared to existing Distributed Shampoo implementations, while also achieving lower validation perplexity per iteration. AI

IMPACT Accelerates training of large machine learning models by improving optimizer efficiency.

RANK_REASON Academic paper detailing a new method for optimizing machine learning algorithms. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

DASH optimizer speeds up Shampoo by up to 5.6x with GPU and root-finding innovations

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Ionut-Vlad Modoranu, Philip Zmushko, Erik Schultheis, Mher Safaryan, Dan Alistarh · 2026-06-26 04:00

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

arXiv:2602.02016v2 Announce Type: replace Abstract: Shampoo is one of the leading approximate second-order optimizers: a variant of it has won the MLCommons AlgoPerf competition, and it has been shown to produce models with lower activation outliers that are easier to compress. Y…

COVERAGE [1]

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

RELATED ENTITIES

RELATED TOPICS