PulseAugur
实时 08:16:43
English(EN) When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

新理论解释了LLM中推测性解码的接受

研究人员开发了一个新的理论框架来理解大型语言模型中的推测性解码,重点关注精确分布采样之外的实际接受标准。该理论将拒绝区域表征为目标分布的低层集合,为贪婪解码和top-(m)标准等各种接受规则提供精确的KL散度证书和基于边距的界限。使用Qwen3模型的评估表明,放宽和基于树的接受策略显著扩展了认证接受,尤其是在低边距解码步骤中。 AI

影响 为优化推测性解码提供了理论基础,有望实现更高效的LLM推理。

排序理由 学术论文,详细介绍了推测性解码的新理论框架。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新理论解释了LLM中推测性解码的接受

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Aaryam Sharma ·

    When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

    arXiv:2606.30265v1 Announce Type: cross Abstract: Speculative decoding accelerates language model inference by using a fast drafter to propose candidate tokens that are then verified by a larger target model. Existing theory largely studies the stochastic, distribution-preserving…

  2. arXiv stat.ML TIER_1 English(EN) · Aaryam Sharma ·

    When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

    Speculative decoding accelerates language model inference by using a fast drafter to propose candidate tokens that are then verified by a larger target model. Existing theory largely studies the stochastic, distribution-preserving setting, where the goal is to exactly sample from…