New HORST optimizer enhances sparse transformer training

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-20 12:34

Researchers have developed HORST, a novel optimizer designed to improve the training of sparse transformers. Standard optimizers struggle to balance the need for sparsity with training stability. HORST addresses this by composing optimizer steps as non-commutative operators, integrating hyperbolic geometry to achieve both stability and L1 sparsity bias. Experiments show HORST significantly outperforms AdamW baselines, especially at higher sparsity levels, across vision and language tasks. AI

影响 Enables more efficient training of sparse transformer models, potentially leading to smaller and faster AI systems.

排序理由 The cluster contains a research paper detailing a new method for training AI models. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Rebekka Burkholz · 2026-05-20 12:34

HORST: Composing Optimizer Geometries for Sparse Transformer Training

Sparsifying transformers remains a fundamental challenge, as standard optimizers fail to simultaneously encourage sparsity and maintain training stability. Effective adaptive optimizers exhibit an implicit $L_{\infty}$ bias favoring stability, yet, sparsity requires an $L_1$ bias…

报道来源 [1]

HORST: Composing Optimizer Geometries for Sparse Transformer Training

相关实体

相关话题