English(EN) Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models

UniVer和SpecKV等新技术通过推测性解码提升LLM推理速度

作者 PulseAugur 编辑部 · [7 个来源] · 2026-04-27 04:00

研究人员开发了新的方法来加速大型语言模型（LLM）的推理。UniVer为多步和多草稿推测性解码提供了一种统一的方法，将接受长度提高了高达8.5%。推测性解码（SSD）引入了一种并行化验证和推测的方法，其优化的Saguaro算法在自回归解码方面实现了高达5倍的加速。此外，SpecKV引入了一种自适应控制器，该控制器根据模型压缩和草稿模型信号动态选择推测长度，与固定长度推测相比，性能提高了56.0%。 AI

影响新的推测性解码技术有望显著提高LLM推理速度，从而降低计算成本和延迟。

排序理由多篇arXiv论文介绍了加速LLM推理的新技术。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 7 个来源。我们如何撰写摘要 →

报道来源 [7]

arXiv cs.LG TIER_1 English(EN) · Yepeng Weng, Qiao Hu, Takehisa Yairi · 2026-05-07 04:00

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

arXiv:2605.04543v1 Announce Type: cross Abstract: Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolat…
arXiv cs.CL TIER_1 English(EN) · Takehisa Yairi · 2026-05-06 06:42

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts…
arXiv cs.LG TIER_1 Română(RO) · Tanishq Kumar, Tri Dao, Avner May · 2026-05-06 04:00

Speculative Speculative Decoding

arXiv:2603.03251v3 Announce Type: replace Abstract: Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then…
arXiv cs.CL TIER_1 English(EN) · Shikhar Shukla · 2026-05-05 04:00

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

arXiv:2605.02888v1 Announce Type: cross Abstract: Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation lengt…
arXiv cs.CL TIER_1 English(EN) · Shikhar Shukla · 2026-05-04 17:55

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length $γ$, which determines how many tokens the draft …
arXiv cs.CL TIER_1 English(EN) · Shikhar Shukla · 2026-05-04 17:55

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$γ$, which determines how many tokens the draft …
arXiv cs.LG TIER_1 English(EN) · Muhammad Shafique, Abdul Basit, Muhammad Abdullah Hanif, Alberto Marchisio, Rachmad Vidya Wicaksana Putra, Minghao Shao · 2026-04-27 04:00

Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models

arXiv:2604.21952v1 Announce Type: new Abstract: This work presents a multi-layered methodology for efficiently accelerating multimodal foundation models (MFMs). It combines hardware and software co-design of transformer blocks with an optimization pipeline that reduces computatio…

报道来源 [7]

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

Speculative Speculative Decoding

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models

相关实体

相关话题