Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 4d · [3 sources]

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Researchers have introduced the Shannon Scaling Law, a new theoretical framework for understanding Large Language Model (LLM) training. This model views LLM training as information transmission through a noisy channel, drawing parallels to the Shannon-Hartley theorem. The framework explains non-monotonic phenomena like overtraining and quantization-induced degradation by analyzing the signal-to-noise ratio (SNR) in relation to model capacity and training data. Experiments on Pythia and OLMo2 models demonstrated that the Shannon Scaling Law significantly outperforms existing scaling laws in predicting model performance, even extrapolating to unseen model sizes. AI

IMPACT Provides a new theoretical lens for understanding LLM scaling, potentially guiding future model development and optimization strategies.

Large Language Models
Pythia
Shannon Scaling Law
OLMo2
Shannon-Hartley theorem