AHASD architecture boosts LLM speculative decoding on mobile devices

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 07:42

Researchers have developed AHASD, a novel asynchronous heterogeneous architecture designed to optimize large language model (LLM) inference on mobile devices. This architecture employs task-level decoupling for parallel drafting and verification, incorporating adaptive control mechanisms to suppress low-confidence drafts. AHASD integrates specialized units for attention computation and task scheduling within PIM memory, aiming to reduce idle overhead and wasted computation. AI

影响 Optimizes LLM inference on mobile devices, potentially enabling more powerful AI applications on resource-constrained hardware.

排序理由 This is a research paper detailing a new architecture for LLM inference optimization.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Ma Zirui, Fan Zhihua, Li Wenxing, Wu Haibin, Zhang Fulin, Ye Xiaochun, Li Wenming · 2026-04-30 04:00

AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

arXiv:2604.25326v2 Announce Type: replace-cross Abstract: Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target language model (TLM). H…
arXiv cs.AI TIER_1 English(EN) · Li Wenming · 2026-04-28 07:42

AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target language model (TLM). However, adaptive drafting inference on a mobile single-NPU…

报道来源 [2]

AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

相关实体

相关话题