Tilde Research has introduced Aurora, a novel optimizer designed to train neural networks more effectively. Aurora addresses a critical issue in the popular Muon optimizer where a significant number of neurons become permanently inactive during training. The new optimizer, demonstrated with a 1.1B parameter pretraining experiment, achieves state-of-the-art performance on the modded-nanoGPT speedrun benchmark and has its code released publicly. AI
影响 Fixes a critical flaw in a widely-used optimizer, potentially improving training efficiency and model performance for large-scale models.
排序理由 The cluster describes the release of a new optimizer for neural network training, including experimental results and open-source code.
- AdamW
- Aurora
- Muon
- nanoGPT
- NorMuon
- Tilde Research
- U-NorMuon
- MLP neurons
- modded-nanoGPT speedrun benchmark
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →