ENTITY AdamW

AdamW

PulseAugur coverage of AdamW — every cluster mentioning AdamW across labs, papers, and developer communities, ranked by signal.

Total · 30d

51

51 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

48

48 over 90d

TIER MIX · 90D

research 25
tool 25
commentary 1

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

12 day(s) with sentiment data

RECENT · PAGE 1/3 · 51 TOTAL

RESEARCH · CL_111222 · Jun 25 · 15:39

New framework uses survey sampling theory to improve gradient optimization

Researchers have developed a novel framework for stochastic gradient optimization that leverages survey sampling theory to reduce variance in gradient estimation. This model-assisted sampling approach incorporates auxil…
TOOL · CL_110101 · Jun 24 · 20:39

Gefen optimizer claims 8x memory reduction for LLM training

Gefen is a new optimizer designed as a drop-in replacement for AdamW, aiming to significantly reduce memory usage during model training. The developers claim Gefen can achieve up to an 8x reduction in memory requirement…
RESEARCH · CL_109596 · Jun 24 · 15:46

New optimizers DMuon and HiMuon boost AI training efficiency · 6 sources tracked

Researchers have developed two new optimization techniques, DMuon and Hierarchical Muon (HiMuon), to improve the efficiency of matrix-orthogonalization-based optimizers like Muon. DMuon integrates into existing training…
TOOL · CL_105047 · Jun 22 · 17:58

Open problem: AdamW optimizer's effectiveness under heavy-tailed noise in LLMs

A recent paper poses an open problem regarding the effectiveness of the AdamW optimizer in training large language models (LLMs) under heavy-tailed noise conditions. While AdamW is widely used, its theoretical understan…
TOOL · CL_100174 · Jun 19 · 04:00

Weibull framework reveals AdamW training dynamics in transformers

A new research paper explores the evolution of weight-scale parameters in transformer models during AdamW training. The study derives a three-force decomposition of the squared weight norm, identifying alignment, inject…
RESEARCH · CL_95803 · Jun 15 · 23:43

New Theory: SA-Adam Adaptivity Asymptotically Invisible

Researchers have published a paper detailing a theoretical analysis of adaptive optimization algorithms, specifically focusing on SA-Adam with momentum and non-convergent adaptive preconditioning. The study proves a non…
RESEARCH · CL_91435 · Jun 15 · 04:00

New research tackles deep learning uncertainty and generalization

Researchers are developing new methods to improve the reliability and understanding of deep learning models. One paper introduces Calibrated Variance Propagation (CVP) to provide accurate uncertainty estimates for trans…
RESEARCH · CL_90992 · Jun 12 · 17:50

Deep Learning Models Achieve High Accuracy in Plant Disease Classification

Researchers have developed advanced deep learning frameworks for classifying plant diseases from leaf images, achieving high accuracy rates. One study focused on lemon leaf disease, utilizing ensemble models like Incept…
TOOL · CL_86796 · Jun 12 · 04:00

LoRA-Muon: New Optimizer Boosts Deep Learning Fine-Tuning Efficiency

Researchers have introduced LoRA-Muon, an optimization technique designed to improve the efficiency and effectiveness of Low-Rank Adaptation (LoRA) for deep learning models. This new method applies spectral steepest-des…
RESEARCH · CL_90893 · Jun 11 · 20:38

New optimization techniques emerge for faster, more efficient AI model training · 8 sources tracked

Several recent arXiv papers explore advancements in optimization techniques for machine learning. Researchers have proposed new methods like Weight Adaptation ASNG (WA-ASNG) to improve parallel performance in evolutiona…
TOOL · CL_84888 · Jun 11 · 04:00

New VRAdam optimizer uses physics to stabilize neural network training

Researchers have developed a new optimizer called Velocity-Regularized Adam (VRAdam) that uses physics-inspired principles to improve deep neural network training. Unlike existing methods like Adam, VRAdam incorporates …
RESEARCH · CL_91199 · Jun 11 · 00:00

On-Policy Distillation Updates Found to Be Sparse and Geometrically Distinct

A new research paper explores the mechanics of on-policy distillation (OPD), a post-training technique that combines on-policy student trajectories with dense teacher supervision. The study reveals that OPD updates are …
RESEARCH · CL_79075 · Jun 7 · 00:51

New 'Muon' optimization technique flattens matrix gradients

A new research paper introduces "Muon," an optimization technique that replaces matrix gradients with their polar factors. This method maintains singular directions but flattens the update spectrum, which the authors su…
TOOL · CL_74964 · Jun 6 · 13:01

Karpathy revisits 1989 neural net, cuts errors with modern AI techniques

Andrej Karpathy recreated a 1989 neural network, achieving a 60% error reduction by applying modern deep learning techniques. He demonstrated that innovations like using cross-entropy loss instead of mean squared error,…
TOOL · CL_68509 · Jun 3 · 04:00

MuLoCo framework enhances LLM training with Muon optimizer

Researchers have introduced MuLoCo, a new framework designed to optimize the training of large language models (LLMs) within the DiLoCo system. MuLoCo addresses performance degradation observed in DiLoCo as the number o…
TOOL · CL_68448 · Jun 3 · 04:00

New decomposition method reveals neural network loss landscape dynamics

Researchers have developed a new method called Spectral Alignment Decomposition to analyze the curvature exponent in neural network loss landscapes. This decomposition reveals that the exponent, which governs how Hessia…
TOOL · CL_68327 · Jun 3 · 04:00

New AI model achieves zero-shot generalization via exact equivariance

Researchers have developed a new method for building latent world models that maintain exact equivariance throughout the training process. This property allows the models to achieve zero-shot generalization across a sym…
TOOL · CL_64740 · Jun 2 · 01:39

NVIDIA Apex tutorial optimizes Transformer training with fused kernels

This tutorial demonstrates how to optimize Transformer training speed using NVIDIA Apex, focusing on its fused kernels like FusedAdam and FusedLayerNorm. It guides users through setting up Apex from source with necessar…
RESEARCH · CL_70164 · Jun 2 · 00:00

Gated Delta Networks scaling rules improve LLM training stability

Researchers have developed new scaling rules for Gated Delta Networks, a type of neural network architecture. These rules, derived through a method called coordinate-size estimation propagation, allow for stable learnin…
RESEARCH · CL_62318 · May 29 · 14:41

New Softsign optimizer improves deep learning parameter handling

Researchers have introduced SoftSignum, a novel optimization method designed to improve parameter heterogeneity handling in deep learning. This technique smooths the sign-based update mechanism with a temperature-contro…