SGD
PulseAugur coverage of SGD — every cluster mentioning SGD across labs, papers, and developer communities, ranked by signal.
8 天有情绪数据
-
New DALS framework optimizes learning rates for neural network training
Researchers have introduced a new framework called Discriminative Adaptive Layer Scaling (DALS) to optimize learning rates in neural networks. DALS categorizes the evolution of learning rate strategies into five generat…
-
New research shows immediate derivatives suffice for online recurrent adaptation
Researchers have developed a new method for online recurrent adaptation that significantly reduces computational requirements. Their approach, termed 'Immediate Derivatives Suffice,' eliminates the need for propagating …
-
像Muon这样的谱优化器在联想记忆任务中表现出急剧的容量缩放
一篇新论文分析了像Muon这样的谱优化器在训练大型语言模型中的性能,通过检查它们在学习联想记忆方面的有效性。研究表明,在存储联想方面,Muon显著优于标准的随机梯度下降(SGD),甚至在使用仅有一阶信息的情况下也能媲美牛顿法。该研究还强调了与SGD相比,Muon的临界批次大小更大,初始恢复率更快,从而对谱预处理器的信号放大进行了量化理解。
-
Researchers analyze Adam's tradeoffs and enhance SignSGD with hybrid switching strategy
Two new research papers explore advancements in optimization algorithms for machine learning. One paper provides a theoretical analysis of the Adam optimizer, detailing its performance under non-stationary objectives an…
-
Researchers explore complex SGD and directional bias in kernel Hilbert spaces
Researchers have introduced a novel variant of Stochastic Gradient Descent (SGD) designed for complex-valued neural networks. This new method, termed complex SGD, offers convergence guarantees even without analyticity c…
-
Decentralized learning research shows single global merge improves performance
Researchers have demonstrated that concentrating communication in the later stages of decentralized learning can significantly improve global test performance, even under high data heterogeneity. A single global merging…
-
LoRA fine-tuning research suggests rank 1 is sufficient, proposes data-aware initialization
Three new research papers explore methods to optimize LoRA fine-tuning for large language models. One paper proposes reducing the LoRA rank threshold to 1 for binary classification tasks, showing competitive performance…
-
Papers challenge deep learning theory with generalization bound critiques
Two papers, one from 2016 by Zhang et al. and another from 2019 by Nagarajan and Kolter, are discussed for their impact on deep learning theory. The 2016 paper demonstrated that standard neural networks could easily mem…
-
New Rose optimizer offers low VRAM, fast convergence, and great results
A new PyTorch optimizer named Rose has been released under the Apache 2.0 license. Developed by Matthew K., Rose is designed to be stateless, offering significantly lower VRAM usage compared to optimizers like AdamW, wi…
-
Researchers propose Bezier Trajectory Matching for clinical dataset condensation
Researchers have introduced Bezier Trajectory Matching (BTM), a novel method for dataset condensation that improves upon existing trajectory matching techniques. BTM replaces the direct supervision of synthetic data wit…
-
New research refines SGD generalization bounds and covariance estimation
Researchers have developed new methods to analyze the generalization capabilities of Stochastic Gradient Descent (SGD) in machine learning. One paper introduces predictable history-adaptive virtual perturbations, allowi…
-
Google AI unveils Nested Learning; OpenAI advances meta-learning and AI safety
Google Research has introduced "Nested Learning," a novel machine learning paradigm designed to address the challenge of catastrophic forgetting in continual learning. This approach views models as interconnected optimi…