PulseAugur
实时 03:38:34

新理论将大语言模型训练建模为噪声信道通信

研究人员引入了香农缩放定律(Shannon Scaling Law),这是一个理解大语言模型(LLM)训练的新理论框架。该模型将LLM训练视为通过噪声信道进行信息传输,与香农-哈特利定理相呼应。该框架通过分析模型容量和训练数据相关的信噪比(SNR),解释了过拟合和量化引起的性能下降等非单调现象。在Pythia和OLMo2模型上的实验表明,香农缩放定律在预测模型性能方面显著优于现有的缩放定律,甚至能外推到未见的模型尺寸。 AI

影响 为理解大语言模型缩放提供了新的理论视角,可能指导未来的模型开发和优化策略。

排序理由 该集群包含一篇提出大语言模型缩放定律新理论框架的学术论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma ·

    大型语言模型作为噪声信道:从香农视角看模型容量与规模法则

    arXiv:2605.23901v1 Announce Type: cross Abstract: Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance…

  2. arXiv cs.AI TIER_1 English(EN) · Yiyuan Ma ·

    大型语言模型作为噪声信道:从香农视角看模型容量与规模法则

    Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propos…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    大型语言模型作为噪声信道:从香农视角看模型容量与扩展定律

    The Shannon Scaling Law models LLM training as information transmission over a noisy channel, explaining non-monotonic performance phenomena through signal-to-noise ratio interactions and demonstrating superior predictive accuracy over traditional scaling laws.