QKVShare framework enables efficient quantized KV-cache handoff for on-device LLMs

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-05 15:44

Researchers have developed QKVShare, a framework designed to improve the efficiency of transferring latent context between agents in multi-agent LLM systems operating on edge devices. This approach utilizes quantized KV-cache handoff, combining token-level mixed-precision allocation with a CacheCard representation and a HuggingFace-compatible injection path. Experiments with Llama-3.1-8B-Instruct on GSM8K problems demonstrated that adaptive quantization is competitive under repeated handoffs and significantly reduces handoff latency compared to full re-prefill. AI

影响 Potentially enables more efficient on-device multi-agent LLM systems by reducing context transfer overhead.

排序理由 Academic paper detailing a new framework for LLM context transfer.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Pratik Honavar, Tejpratap GVSL · 2026-05-07 04:00

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

arXiv:2605.03884v1 Announce Type: new Abstract: Multi-agent LLM systems on edge devices need to hand off latent context efficiently, but the practical choices today are expensive re-prefill or full-precision KV transfer. We study QKVShare, a framework for quantized KV-cache hando…
arXiv cs.AI TIER_1 English(EN) · Tejpratap GVSL · 2026-05-05 15:44

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

Multi-agent LLM systems on edge devices need to hand off latent context efficiently, but the practical choices today are expensive re-prefill or full-precision KV transfer. We study QKVShare, a framework for quantized KV-cache handoff between agents that combines token-level mixe…

报道来源 [2]

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

相关实体

相关话题