新理论将大语言模型训练建模为噪声信道通信

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-22 00:00

研究人员引入了香农缩放定律（Shannon Scaling Law），这是一个理解大语言模型（LLM）训练的新理论框架。该模型将LLM训练视为通过噪声信道进行信息传输，与香农-哈特利定理相呼应。该框架通过分析模型容量和训练数据相关的信噪比（SNR），解释了过拟合和量化引起的性能下降等非单调现象。在Pythia和OLMo2模型上的实验表明，香农缩放定律在预测模型性能方面显著优于现有的缩放定律，甚至能外推到未见的模型尺寸。 AI

影响为理解大语言模型缩放提供了新的理论视角，可能指导未来的模型开发和优化策略。

排序理由该集群包含一篇提出大语言模型缩放定律新理论框架的学术论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma · 2026-05-25 04:00

大型语言模型作为噪声信道：从香农视角看模型容量与规模法则

arXiv:2605.23901v1 Announce Type: cross Abstract: Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance…
arXiv cs.AI TIER_1 English(EN) · Yiyuan Ma · 2026-05-22 17:59

大型语言模型作为噪声信道：从香农视角看模型容量与规模法则

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propos…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-22 00:00

大型语言模型作为噪声信道：从香农视角看模型容量与扩展定律

The Shannon Scaling Law models LLM training as information transmission over a noisy channel, explaining non-monotonic performance phenomena through signal-to-noise ratio interactions and demonstrating superior predictive accuracy over traditional scaling laws.

报道来源 [3]

大型语言模型作为噪声信道：从香农视角看模型容量与规模法则

大型语言模型作为噪声信道：从香农视角看模型容量与规模法则

大型语言模型作为噪声信道：从香农视角看模型容量与扩展定律

相关实体

相关话题