Research paper details optimal Schatten-p norm usage in deep learning

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

A new research paper explores the optimal use of Schatten-p norms in deep learning, particularly in relation to optimizers like Muon. The study demonstrates that the effectiveness of these norms is dependent on the specific regime, with smaller Schatten-p geometries proving optimal in low-dimensional settings, including those relevant to Chinchilla scaling. This analysis also provides insights into why Muon-like methods favor large batches and offers a scaling rule for batch sizes across different values of p. AI

IMPACT Provides theoretical guidance on optimizing deep learning models, potentially improving training efficiency and performance.

RANK_REASON The cluster contains a research paper published on arXiv detailing theoretical findings in deep learning optimization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Thomas Pethick · 2026-06-16 04:00

When to use what Schatten-$p$ norm in deep learning?

arXiv:2606.15268v1 Announce Type: new Abstract: Schatten-$\infty$ based optimizers such as Muon have shown promising empirical performance, but there remains seemingly conflicting observations regarding whether they are beneficial. We resolve this conflict by showing that the con…

COVERAGE [1]

When to use what Schatten-$p$ norm in deep learning?

RELATED ENTITIES

RELATED TOPICS