PulseAugur
实时 12:03:26

AHASD architecture boosts LLM speculative decoding on mobile devices

Researchers have developed AHASD, a novel asynchronous heterogeneous architecture designed to optimize large language model (LLM) inference on mobile devices. This architecture employs task-level decoupling for parallel drafting and verification, incorporating adaptive control mechanisms to suppress low-confidence drafts. AHASD integrates specialized units for attention computation and task scheduling within PIM memory, aiming to reduce idle overhead and wasted computation. AI

影响 Optimizes LLM inference on mobile devices, potentially enabling more powerful AI applications on resource-constrained hardware.

排序理由 This is a research paper detailing a new architecture for LLM inference optimization.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

AHASD architecture boosts LLM speculative decoding on mobile devices

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Ma Zirui, Fan Zhihua, Li Wenxing, Wu Haibin, Zhang Fulin, Ye Xiaochun, Li Wenming ·

    AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

    arXiv:2604.25326v2 Announce Type: replace-cross Abstract: Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target language model (TLM). H…

  2. arXiv cs.AI TIER_1 English(EN) · Li Wenming ·

    AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

    Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target language model (TLM). However, adaptive drafting inference on a mobile single-NPU…