English(EN) AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

AHASD架构提升移动设备上LLM的推测解码性能

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 07:42

研究人员开发了AHASD，一种新颖的异步异构架构，旨在优化移动设备上的大型语言模型（LLM）推理。该架构采用任务级解耦进行并行草稿生成和验证，并结合自适应控制机制来抑制低置信度的草稿。AHASD在PIM内存中集成了专门的注意力计算和任务调度单元，旨在减少空闲开销和计算浪费。 AI

影响优化了移动设备上的LLM推理，有可能在资源受限的硬件上实现更强大的AI应用。

排序理由这是一篇详细介绍LLM推理优化新架构的研究论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Ma Zirui, Fan Zhihua, Li Wenxing, Wu Haibin, Zhang Fulin, Ye Xiaochun, Li Wenming · 2026-04-30 04:00

AHASD：LLM自适应草稿推测解码在移动设备上的异步异构架构

arXiv:2604.25326v2 Announce Type: replace-cross Abstract: Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target language model (TLM). H…
arXiv cs.AI TIER_1 English(EN) · Li Wenming · 2026-04-28 07:42

AHASD：面向移动设备的LLM自适应草稿推测性解码的异步异构架构

Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target language model (TLM). However, adaptive drafting inference on a mobile single-NPU…