PulseAugur
实时 23:20:24
English(EN) When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?

新方法通过改进推测解码来加速大语言模型推理

研究人员正在开发新方法来加速大语言模型(LLM)推理,这个过程通常会因顺序解码而变慢。几篇近期论文探讨了推测解码技术,该技术使用一个较小的“草稿”模型来提议词元,然后由一个较大的“目标”模型进行验证。创新包括结合多草稿和块验证策略,利用KV缓存获取更丰富的草稿信号,以及开发接受语义正确但不完全匹配的无训练方法。这些方法旨在显著提高解码速度,同时保持输出质量和跨不同模型及任务的泛化能力。 AI

影响 新的推测解码方法有望显著加快大语言模型推理速度,从而降低运营成本并支持实时应用。

排序理由 多篇在arXiv上发表的学术论文介绍了用于大语言模型推理中推测解码的新技术。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

新方法通过改进推测解码来加速大语言模型推理

报道来源 [5]

  1. arXiv cs.CL TIER_1 English(EN) · Yijun Lin, Jinhao Sheng, Qingyue Cai, Feng Zhou ·

    SpecTr-GBV: Multi-Draft Block Verification Accelerating Speculative Decoding

    arXiv:2604.25925v1 Announce Type: new Abstract: Autoregressive language models suffer from high inference latency due to their sequential decoding nature. Speculative decoding (SD) mitigates this by employing a lightweight draft model to propose candidate tokens, which are select…

  2. arXiv cs.CL TIER_1 English(EN) · Tianyu Liu, Yuhao Shen, Xinyi Hu, Baolin Zhang, Hengxin Zhang, Jun Dai, Jun Zhang, Shuang Ge, Lei Chen, Yue Li, MingCheng Wan ·

    When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?

    arXiv:2604.26412v1 Announce Type: new Abstract: Speculative decoding accelerates LLM inference, but SOTA hidden-state-based drafters suffer from long-range decay: draft accuracy degrades as the speculative step increases. Existing work attributes this decay to train-inference mis…

  3. arXiv cs.CL TIER_1 English(EN) · Tianyu Liu, Qitan Lv, Hao Li, Xing Gao, Xiao Sun, Xiaoyan Sun ·

    LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

    arXiv:2507.01449v3 Announce Type: replace Abstract: Speculative decoding (SD), where a small draft model is employed to propose draft tokens in advance and then the target model validates them in parallel, has emerged as a promising technique for LLM inference acceleration. Many …

  4. arXiv cs.CL TIER_1 English(EN) · Jinze Li, Yixing Xu, Guanchen Li, Shuo Yang, Jinfeng Xu, Xuanwu Yin, Dong Li, Edith C. H. Ngai, Emad Barsoum ·

    Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

    arXiv:2511.22972v3 Announce Type: replace Abstract: Large language models (LLMs) achieve strong performance across diverse tasks but suffer from high inference latency due to their autoregressive generation. Speculative Decoding (SPD) mitigates this issue by verifying candidate t…

  5. arXiv cs.CL TIER_1 English(EN) · MingCheng Wan ·

    When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?

    Speculative decoding accelerates LLM inference, but SOTA hidden-state-based drafters suffer from long-range decay: draft accuracy degrades as the speculative step increases. Existing work attributes this decay to train-inference mismatch and proposes test-time training (TTT) as a…