A new research paper explores the optimal use of Schatten-p norms in deep learning, particularly in relation to optimizers like Muon. The study demonstrates that the effectiveness of these norms is dependent on the specific regime, with smaller Schatten-p geometries proving optimal in low-dimensional settings, including those relevant to Chinchilla scaling. This analysis also provides insights into why Muon-like methods favor large batches and offers a scaling rule for batch sizes across different values of p. AI
IMPACT Provides theoretical guidance on optimizing deep learning models, potentially improving training efficiency and performance.
RANK_REASON The cluster contains a research paper published on arXiv detailing theoretical findings in deep learning optimization. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- Chinchilla
- DagsHub
- Gotit.pub
- Hugging Face
- IArxiv
- Influence Flower
- Muon
- Schatten-p norm
- ScienceCast
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →