Researchers have introduced Aurora, a new optimizer designed to improve the training of large neural networks, particularly those with rectangular matrices. Aurora addresses issues like neuron death in MLP layers that can occur with existing optimizers like Muon, especially when row normalization is applied. By incorporating leverage-awareness and maintaining orthogonality, Aurora demonstrates significant data efficiency, achieving 100x improvement on open-source internet data and outperforming larger models on general evaluations. The optimizer is presented as a drop-in replacement with minimal overhead, and its code has been open-sourced. AI
影响 New optimizer Aurora enhances training efficiency and data utilization for large models, potentially accelerating research and development.
排序理由 The cluster details a new research paper introducing a novel optimizer for neural networks, including performance benchmarks and open-sourced code.
在 Mastodon — fosstodon.org 阅读 →
AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →