PulseAugur
EN
LIVE 12:21:59

Zeta Optimizer Improves Neural Network Training with Dual Whitening

Researchers have introduced Zeta, a novel dual whitening optimizer designed to improve large-scale neural network training. Zeta addresses the scale heterogeneity in momentum matrices, a vulnerability in existing matrix-aware optimizers like Muon. By applying coordinate whitening followed by spectral whitening, Zeta enhances the condition number of the input, leading to reduced orthogonalization error and faster convergence. The optimizer has demonstrated competitive or superior performance across various tasks, including language modeling and vision tasks, for models ranging from 0.6B to 8B parameters. AI

IMPACT Zeta's dual whitening approach could accelerate convergence and improve generalization in large-scale neural network training.

RANK_REASON The cluster contains a research paper detailing a new optimization technique for neural networks.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Zeta Optimizer Improves Neural Network Training with Dual Whitening

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Kaiwen Chen, Shuhai Zhang, Qiuwu Chen, Zimo Liu, Linxiao Li, Ying Sun, Yuchen Li, Yifan Zhang, Bo Han, Mingkui Tan ·

    Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

    arXiv:2606.14187v1 Announce Type: new Abstract: Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappr…

  2. arXiv cs.LG TIER_1 English(EN) · Mingkui Tan ·

    Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

    Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappreciated vulnerability: their core operation, New…