PulseAugur
实时 19:30:56
English(EN) Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression

新模型用稀疏特征解释神经网络缩放定律

研究人员开发了一个新模型来理解存在稀疏激活时的神经网络缩放定律。该模型揭示了测试损失可能受到训练期间未见的罕见数据点的显著影响,从而造成独特的瓶颈。该研究推导了渐近总体损失,显示在插值阈值附近存在双下降峰值,并且在过参数化和欠参数化状态下具有不同的缩放指数,其差距取决于稀疏性。 AI

影响 引入了一个理论框架,用于理解由于稀疏数据导致的模型性能限制,可能指导未来的模型架构和训练策略。

排序理由 该集群包含一篇详细介绍神经网络缩放定律新模型的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.LG TIER_1 English(EN) · Diyuan Wu, Lehan Chen, Theodor Misiakiewicz, Marco Mondelli ·

    Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression

    arXiv:2603.05691v2 Announce Type: replace Abstract: It is increasingly common in machine learning to use learned models to label data and then employ such data to train more capable models. The phenomenon of weak-to-strong generalization exemplifies the advantage of this two-stag…

  2. arXiv stat.ML TIER_1 English(EN) · John Sous, Michael Winer ·

    Asymmetric Scaling Laws from Sparse Features

    arXiv:2605.23591v1 Announce Type: new Abstract: We introduce a model for neural scaling laws under sparse activations. In the model, test loss is often dominated by rare coordinates that are never observed in the training input. This mechanism induces a novel bottleneck absent fr…

  3. arXiv stat.ML TIER_1 English(EN) · Michael Winer ·

    Asymmetric Scaling Laws from Sparse Features

    We introduce a model for neural scaling laws under sparse activations. In the model, test loss is often dominated by rare coordinates that are never observed in the training input. This mechanism induces a novel bottleneck absent from dense models. We derive the asymptotic popula…