SGD
PulseAugur coverage of SGD — every cluster mentioning SGD across labs, papers, and developer communities, ranked by signal.
8 天有情绪数据
-
New theory explains neural network training instabilities
Researchers have developed a new theoretical framework using non-Hermitian operator theory to explain and predict training instabilities in deep neural networks. The study identifies that common optimizers like Adam and…
-
Factor Augmented SGD optimizes high-dimensional machine learning
Researchers have introduced Factor-Augmented SGD (FSGD), a novel optimization method designed for high-dimensional machine learning tasks. FSGD operates on streaming data, enabling scalability for large-scale problems w…
-
New optimizers AMUSE, MiMuon, and Pion enhance deep learning training
Researchers have developed several new optimization techniques to improve deep learning model training. AMUSE combines the rapid adaptation of Muon with the stability of Schedule-Free averaging, eliminating the need for…
-
Paper questions bias-variance tradeoff for 70B parameter transformers
A new paper explores the limitations of the bias-variance tradeoff in large transformer models, specifically those with 70 billion parameters. The research suggests that standard Stochastic Gradient Descent (SGD) method…
-
New research explores advanced optimization for machine learning
Several recent research papers explore advanced optimization techniques for machine learning. One paper introduces a derivative-free consensus-based method for nonconvex bi-level optimization, demonstrating convergence …
-
Learn2Splat optimizer enhances 3D Gaussian Splatting efficiency
Researchers have developed a novel learned optimizer for 3D Gaussian Splatting (3DGS) that improves optimization efficiency and convergence speed. This new method, called Learn2Splat, addresses limitations of standard o…
-
Convergent Abstraction Hypothesis proposes similar AI concepts from shared pressures
The Convergent Abstraction Hypothesis suggests that different cognitive systems, when faced with similar environmental pressures and learning conditions, will independently develop the same abstract concepts. This idea …
-
New theory bounds KAN training, reveals privacy-utility gap
Researchers have established new theoretical bounds for training Kolmogorov-Arnold Networks (KANs), a structured alternative to standard MLPs. The work analyzes KANs trained with mini-batch stochastic gradient descent (…
-
SGD Learns k-Juntas Efficiently with Temporal Correlations
Researchers have demonstrated that temporal correlations in data can significantly improve the efficiency of gradient-based learning methods for specific sparse problems. By using samples generated from a random walk on…
-
New theory boosts generalization for decentralized learning
Researchers have developed a new high-probability learning theory for decentralized stochastic gradient descent (D-SGD). This theory aims to close a gap in generalization guarantees between traditional SGD and D-SGD, ta…
-
New R-SGD-Mini method tackles heavy-tailed noise in optimization
Researchers have introduced a new optimization method called Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling (R-SGD-Mini). This method is designed to handle heavy-tailed noise in gradient cal…
-
New analysis reveals how step size impacts SGD alignment phenomenon
This paper analyzes the phenomenon of "suspicious alignment" in stochastic gradient descent (SGD) when dealing with ill-conditioned optimization problems. The study focuses on how step size selection influences the alig…
-
New principle optimizes AI model training by aligning gradients and updates
Researchers have introduced a new principle called Greedy Alignment for selecting and tuning optimizer hyperparameters in machine learning. This principle treats optimizers as causal filters that map gradients to update…
-
SignSGD and Muon optimizers' performance gains theoretically explained
Researchers have theoretically analyzed why sign-based optimization algorithms like SignSGD and Muon can outperform standard SGD in training large models. A new study suggests that SignSGD's advantage stems from its eff…
-
GONO optimizer adapts Adam's momentum using directional consistency for better convergence
Researchers have introduced the GONO framework, an optimization signal designed to improve deep learning training by addressing the decoupling of directional alignment and loss convergence. Unlike existing optimizers th…
-
Bringing Order to Asynchronous SGD: Towards Optimality under Data-Dependent Delays with Momentum
Researchers have developed a new asynchronous framework for stochastic gradient descent (SGD) that aims to improve distributed training efficiency. This method uses momentum to preserve information from delayed gradient…
-
The Measure of Deception: An Analysis of Data Forging in Machine Unlearning
Two new research papers explore vulnerabilities and detection methods in machine unlearning, a process designed to remove specific data from trained models for privacy compliance. One paper, "DurableUn," reveals that lo…
-
Anon optimizer offers tunable adaptivity, outperforming Adam and SGD on key tasks
Researchers have introduced Anon, a novel optimizer designed to bridge the performance gap between adaptive methods like Adam and non-adaptive methods like SGD. Anon features continuously tunable adaptivity, allowing it…
-
Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning
A new theory, the Norm-Separation Delay Law, explains the phenomenon of grokking, where models generalize long after memorizing training data. Researchers demonstrated that grokking is a representational phase transitio…
-
New theories explore how pre-training and sparse connectivity enhance deep learning generalization
Three new papers explore the theoretical underpinnings of generalization in deep learning. One paper identifies pre-training as a critical factor for weak-to-strong generalization, demonstrating its emergence through a …