New HORST optimizer enhances sparse transformer training

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed HORST, a novel optimizer designed to improve the training of sparse transformers. Standard optimizers struggle to balance the need for sparsity with training stability. HORST addresses this by composing optimizer steps as non-commutative operators, integrating hyperbolic geometry to achieve both stability and L1 sparsity bias. Experiments show HORST significantly outperforms AdamW baselines, especially at higher sparsity levels, across vision and language tasks. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enables more efficient training of sparse transformer models, potentially leading to smaller and faster AI systems.

RANK_REASON The cluster contains a research paper detailing a new method for training AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

COVERAGE [1]

arXiv cs.LG TIER_1 · Rebekka Burkholz · 2026-05-20 12:34

HORST: Composing Optimizer Geometries for Sparse Transformer Training

Sparsifying transformers remains a fundamental challenge, as standard optimizers fail to simultaneously encourage sparsity and maintain training stability. Effective adaptive optimizers exhibit an implicit $L_{\infty}$ bias favoring stability, yet, sparsity requires an $L_1$ bias…

COVERAGE [1]

HORST: Composing Optimizer Geometries for Sparse Transformer Training

RELATED ENTITIES

RELATED TOPICS