Researchers have introduced Zeta, a novel dual whitening optimizer designed to improve large-scale neural network training. Zeta addresses the scale heterogeneity in momentum matrices, a vulnerability in existing matrix-aware optimizers like Muon. By applying coordinate whitening followed by spectral whitening, Zeta enhances the condition number of the input, leading to reduced orthogonalization error and faster convergence. The optimizer has demonstrated competitive or superior performance across various tasks, including language modeling and vision tasks, for models ranging from 0.6B to 8B parameters. AI
IMPACT Zeta's dual whitening approach could accelerate convergence and improve generalization in large-scale neural network training.
RANK_REASON The cluster contains a research paper detailing a new optimization technique for neural networks.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →