PulseAugur
EN
LIVE 22:22:55

New theory models LLM training as noisy channel communication

Researchers have introduced the Shannon Scaling Law, a new theoretical framework for understanding Large Language Model (LLM) training. This model views LLM training as information transmission through a noisy channel, drawing parallels to the Shannon-Hartley theorem. The framework explains non-monotonic phenomena like overtraining and quantization-induced degradation by analyzing the signal-to-noise ratio (SNR) in relation to model capacity and training data. Experiments on Pythia and OLMo2 models demonstrated that the Shannon Scaling Law significantly outperforms existing scaling laws in predicting model performance, even extrapolating to unseen model sizes. AI

IMPACT Provides a new theoretical lens for understanding LLM scaling, potentially guiding future model development and optimization strategies.

RANK_REASON The cluster contains an academic paper proposing a new theoretical framework for LLM scaling laws.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

  1. arXiv cs.AI TIER_1 English(EN) · Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma ·

    LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

    arXiv:2605.23901v1 Announce Type: cross Abstract: Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance…

  2. arXiv cs.AI TIER_1 English(EN) · Yiyuan Ma ·

    LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

    Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propos…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

    The Shannon Scaling Law models LLM training as information transmission over a noisy channel, explaining non-monotonic performance phenomena through signal-to-noise ratio interactions and demonstrating superior predictive accuracy over traditional scaling laws.