English(EN) CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

新的CLP方法在不损失质量的情况下加速LLM推理

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-09 14:45

研究人员开发了一种名为搭配长度预测（CLP）的新方法来加速大型语言模型推理。CLP解决了多令牌预测（MTP）中的一个核心问题，即后续令牌的预测头会干扰主要的语言模型头，导致质量下降。通过重新设计架构，使主头始终生成第一个令牌，而一个轻量级的CLP层预测后续令牌，该方法在不牺牲输出质量的情况下实现了显著的加速。在Qwen2.5模型上的实验表明，重复率可忽略不计的情况下，速度提升高达1.29倍。 AI

影响引入了一种新颖、轻量级的方法来加速LLM推理，有可能降低实时应用程序的计算成本和延迟。

排序理由该集群包含一篇详细介绍提高LLM推理效率新方法的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Xuezhen Xie, Zhiqiang Zhou · 2026-06-10 04:00

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

arXiv:2606.10935v1 Announce Type: cross Abstract: Large language model inference is bottlenecked by autoregressive decoding, where each token requires a full forward pass. Multi-token prediction (MTP) offers a promising acceleration path, but existing approaches suffer from a fun…
arXiv cs.AI TIER_1 English(EN) · Zhiqiang Zhou · 2026-06-09 14:45

CLP：零损耗自适应多令牌推理的搭配长度预测

Large language model inference is bottlenecked by autoregressive decoding, where each token requires a full forward pass. Multi-token prediction (MTP) offers a promising acceleration path, but existing approaches suffer from a fundamental architectural flaw: the MTP head for the …

报道来源 [2]

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

CLP：零损耗自适应多令牌推理的搭配长度预测

相关实体

相关话题