English(EN) CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

新的CLP方法以零质量损失加速LLM推理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-09 14:45

研究人员开发了一种名为搭配长度预测（CLP）的新方法来加速大型语言模型推理。CLP通过确保骨干语言模型始终生成第一个令牌来解决多令牌预测（MTP）中的一个关键问题，即预测头可能降低输出质量。这种轻量级方法使用单个线性层来预测可以安全接受多少额外令牌，在Qwen2.5模型上实现了高达1.29倍的速度提升，且质量无损。 AI

影响加速LLM推理，可能实现更快、更高效的AI应用部署。

排序理由介绍LLM推理加速新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Zhiqiang Zhou · 2026-06-09 14:45

CLP：零损耗自适应多令牌推理的搭配长度预测

Large language model inference is bottlenecked by autoregressive decoding, where each token requires a full forward pass. Multi-token prediction (MTP) offers a promising acceleration path, but existing approaches suffer from a fundamental architectural flaw: the MTP head for the …

报道来源 [1]

CLP：零损耗自适应多令牌推理的搭配长度预测

相关实体

相关话题