Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 8h

When to use what Schatten-$p$ norm in deep learning?

A new research paper explores the optimal use of Schatten-p norms in deep learning, particularly in relation to optimizers like Muon. The study demonstrates that the effectiveness of these norms is dependent on the specific regime, with smaller Schatten-p geometries proving optimal in low-dimensional settings, including those relevant to Chinchilla scaling. This analysis also provides insights into why Muon-like methods favor large batches and offers a scaling rule for batch sizes across different values of p. AI

IMPACT Provides theoretical guidance on optimizing deep learning models, potentially improving training efficiency and performance.

Hugging Face
arXiv
Muon
DagsHub
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
Chinchilla
Influence Flower
IArxiv
Schatten-p norm