PulseAugur
EN
LIVE 10:45:49

Research paper details optimal Schatten-p norm usage in deep learning

A new research paper explores the optimal use of Schatten-p norms in deep learning, particularly in relation to optimizers like Muon. The study demonstrates that the effectiveness of these norms is dependent on the specific regime, with smaller Schatten-p geometries proving optimal in low-dimensional settings, including those relevant to Chinchilla scaling. This analysis also provides insights into why Muon-like methods favor large batches and offers a scaling rule for batch sizes across different values of p. AI

IMPACT Provides theoretical guidance on optimizing deep learning models, potentially improving training efficiency and performance.

RANK_REASON The cluster contains a research paper published on arXiv detailing theoretical findings in deep learning optimization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Thomas Pethick ·

    When to use what Schatten-$p$ norm in deep learning?

    arXiv:2606.15268v1 Announce Type: new Abstract: Schatten-$\infty$ based optimizers such as Muon have shown promising empirical performance, but there remains seemingly conflicting observations regarding whether they are beneficial. We resolve this conflict by showing that the con…