SGD
PulseAugur coverage of SGD — every cluster mentioning SGD across labs, papers, and developer communities, ranked by signal.
- instance of stochastic gradient descent 90%
- used by rectifier 90%
- used by AdamW 70%
- competes with muon 70%
- used by CatalyzeX 70%
- used by Gotit.pub 70%
- used by alphaXiv 70%
- used by ScienceCast 70%
- instance of ResNet-18 70%
- used by machine learning 70%
- competes with AdamW 60%
- affiliated with AdamW 60%
16 day(s) with sentiment data
-
New SCENT algorithm improves optimization for entropic risk minimization
Researchers have developed a new algorithm called SCENT for compositional entropic risk minimization, a problem formulation involving Log-Expectation-Exponential functions. Existing methods for this type of optimization…
-
New framework reveals SGD limitations for multi-index models
A new framework has been developed to analyze the limitations of standard stochastic gradient descent (SGD) for multi-index models, which are functions dependent on low-dimensional projections of input data. This resear…
-
New decentralized AI training method finds flatter minima, beats centralized SGD
Researchers have developed a new decentralized training method called DSGD-AC that challenges the notion that decentralized learning is inherently inferior to centralized approaches. This method uses an adaptive consens…
-
New GRAIN algorithm tackles learning instability in large AI models
Researchers have introduced GRAIN, a novel training algorithm designed to address learning instability in large, overparameterized deep learning models. GRAIN replaces the standard mean aggregation of gradients with a m…
-
AI agent compares cross-border prices with 73% click-through rate · 2 sources tracked
This build log details the creation of a cross-border price comparison agent using BuyWhere MCP and OpenAI's Agents SDK. The agent aims to find the cheapest product offers across different regions and currencies, consid…
-
New theory grounds deep learning flatness in Riemannian geometry
Researchers have developed a new theoretical framework for understanding the generalization capabilities of deep learning models by grounding the concept of flatness in Riemannian geometry. This approach utilizes the Fi…
-
New research explores nonlinear dynamics stability in GD and SGD
Researchers have investigated the stability of nonlinear dynamics in gradient descent (GD) and stochastic gradient descent (SGD) optimization algorithms, moving beyond simplified quadratic potential assumptions. The stu…
-
Research paper analyzes compute efficiency and runtime tradeoffs for momentum methods
A new research paper explores the tradeoffs between serial runtime and compute efficiency for stochastic momentum methods like Heavy Ball (HB) and Accelerated SGD (ASGD). The study proves finite-dimensional lower bounds…
-
Machine Learning in Healthcare Course Syllabus Detailed
This document outlines a comprehensive curriculum for a Machine Learning in Healthcare course. It covers fundamental concepts like the distinction between machine learning and deep learning, various neural network archi…
-
New theory explains grokking in deep neural networks via L2 phase transitions
Researchers have developed a new theory explaining the phenomenon of "grokking" in deep neural networks, where a model abruptly begins to generalize after a period of overfitting. The study, published on arXiv, proposes…
-
Mixed-Precision CA-SGD Accelerates Training on GPUs
Researchers have developed a mixed-precision communication-avoiding SGD (CA-SGD) method for generalized linear models on GPUs. This approach aims to reduce communication bottlenecks in distributed training by amortizing…
-
New Schattor optimization methods unify SGD and Muon for deep learning
Researchers have introduced Schattor, a new family of adaptive optimization methods for deep learning that utilize Schatten norms. This framework unifies existing methods like SGD and Muon, addressing challenges posed b…
-
New research explores advanced sampling techniques for machine learning
Two new research papers explore advanced techniques for sampling from complex probability distributions, a critical task in machine learning. The first paper, submitted to arXiv, focuses on variance reduction methods li…
-
New Theory: SA-Adam Adaptivity Asymptotically Invisible
Researchers have published a paper detailing a theoretical analysis of adaptive optimization algorithms, specifically focusing on SA-Adam with momentum and non-convergent adaptive preconditioning. The study proves a non…
-
New research explores domain generalization methods, including simple baselines and novel optimizers
Researchers are exploring new methods for domain generalization (DG) and open domain generalization (ODG) in machine learning. One study demonstrates that simple DG methods like CORAL and MMD can be competitive with mor…
-
New framework tackles data heterogeneity in hierarchical federated learning
Researchers have developed a new framework for hierarchical federated learning that addresses the issue of data heterogeneity across different clusters. The proposed DC-HierSignSGD algorithm uses binary sign-based stoch…
-
Adam vs. SGD: No single factor explains performance gap, study finds
A new research paper explores the performance gap between the Adam and SGD optimization algorithms, finding that no single factor consistently explains the difference. The study indicates that the gap arises from comple…
-
On-Policy Distillation Updates Found to Be Sparse and Geometrically Distinct
A new research paper explores the mechanics of on-policy distillation (OPD), a post-training technique that combines on-policy student trajectories with dense teacher supervision. The study reveals that OPD updates are …
-
New research analyzes GD/SGD stability in discrete parameter spaces
Researchers have analyzed the generalization error and stability of gradient descent (GD) and stochastic gradient descent (SGD) algorithms when applied to discrete parameter spaces with rounding. Their findings indicate…
-
Karpathy revisits 1989 neural net, cuts errors with modern AI techniques
Andrej Karpathy recreated a 1989 neural network, achieving a 60% error reduction by applying modern deep learning techniques. He demonstrated that innovations like using cross-entropy loss instead of mean squared error,…