GLU structures accelerate LLM optimization by reshaping NTK spectrum

By PulseAugur Editorial · [2 sources] · 2026-05-20 05:50

Researchers have investigated why Gated Linear Units (GLU) are superior to non-GLU structures in large language models. Their analysis in the neural tangent kernel regime indicates that GLU reshapes the NTK spectrum, resulting in a smaller condition number and faster convergence. While GLU appears to accelerate optimization, empirical observations suggest it has a limited effect on reducing the generalization gap in models like ViT and GPT-2. AI

IMPACT Explains a key architectural advantage in LLMs, potentially guiding future model design for faster training.

RANK_REASON The cluster contains an academic paper detailing research findings on model architecture.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

GLU structures accelerate LLM optimization by reshaping NTK spectrum

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Xingyu Lyu, Qianqian Xu, Zhiyong Yang, Peisong Wen, Qingming Huang · 2026-05-22 04:00

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?

arXiv:2605.20749v1 Announce Type: cross Abstract: Gated Linear Units (GLU) and their variants are widely adopted in modern open-source large language model architectures and consistently outperform their non-gated counterparts, yet the underlying reasons for this advantage remain…
arXiv cs.AI TIER_1 English(EN) · Qingming Huang · 2026-05-20 05:50

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?

Gated Linear Units (GLU) and their variants are widely adopted in modern open-source large language model architectures and consistently outperform their non-gated counterparts, yet the underlying reasons for this advantage remain unclear. In this work, we study GLU by analyzing …

COVERAGE [2]

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?

The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?

RELATED ENTITIES

RELATED TOPICS