HCInfer system enables LLMs on resource-constrained devices with error compensation

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 04:00

Researchers have developed HCInfer, a novel inference system designed to enable large language models (LLMs) to run efficiently on devices with limited memory. This system offloads parts of the model's compensation mechanism to the CPU while the main compressed model runs on the GPU. HCInfer also incorporates an asynchronous pipeline and dynamic rank allocation to minimize overhead and maximize accuracy, reportedly improving accuracy by up to 5.2% and achieving a speedup of 10.4x compared to full-precision models. AI

影响 Enables efficient deployment of LLMs on resource-constrained devices, potentially broadening access and application.

排序理由 This is a research paper detailing a new inference system for LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Shen Xu, Xiangwen Zhuge, Zhe Xu, Yingkun Hu, Zheng Yang, Yunhao Liu · 2026-05-08 04:00

HCInfer: An Efficient Inference System via Error Compensation for Resource-Constrained Devices

arXiv:2605.05819v1 Announce Type: new Abstract: LLMs often struggle with memory-constrained deployment on consumer-grade hardware due to their massive parameter sizes. While existing solutions such as model compression and offloading improve deployment feasibility, they often suf…

报道来源 [1]

HCInfer: An Efficient Inference System via Error Compensation for Resource-Constrained Devices

相关实体

相关话题