新的缓存技术提升LLM和扩散模型效率

作者 PulseAugur 编辑部 · [5 个来源] · 2026-06-11 09:51

研究人员开发了MiniPIC，一种用于大型语言模型推理的高效缓存新方法，只需对vLLM等现有系统进行少于100行的代码更改。该方法将预填充吞吐量提高了49%，并显著降低了缓存跨度的延迟。此外，还为扩散模型引入了一种名为BudCache的新技术，该技术根据固定的计算预算优化缓存策略，以保持输出质量，在FLUX.1-dev和Wan2.1上表现优于启发式方法。 AI

影响这些缓存创新有望降低大型语言模型和扩散模型的推理成本并提高其速度。

排序理由该集群包含两篇详细介绍AI模型新缓存技术的独立研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.AI TIER_1 English(EN) · Nathan Ordonez (IBM Research), Thomas Parnell (IBM Research) · 2026-06-12 04:00

MiniPIC: <100行代码实现灵活的与位置无关的缓存

arXiv:2606.13126v1 Announce Type: cross Abstract: Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV entri…
arXiv cs.CL TIER_1 English(EN) · Thomas Parnell · 2026-06-11 09:51

MiniPIC: <100行代码实现灵活的与位置无关的缓存

Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV entries unless they share identical prefixes with anoth…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 09:51

MiniPIC: <100行代码实现灵活的与位置无关的缓存

Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV entries unless they share identical prefixes with anoth…
arXiv cs.CV TIER_1 English(EN) · Mingkun Lei, Tong Zhao, Liangyu Yuan, Chi Zhang · 2026-06-12 04:00

预算受限的步进式扩散缓存

arXiv:2606.13496v1 Announce Type: new Abstract: Step-level caching accelerates diffusion models by exploiting temporal redundancy across denoising steps. Existing methods make per-step cache decisions using threshold-based heuristics, without directly optimizing for final output …
arXiv cs.CV TIER_1 English(EN) · Chi Zhang · 2026-06-11 15:45

预算受限的步进式扩散缓存

Step-level caching accelerates diffusion models by exploiting temporal redundancy across denoising steps. Existing methods make per-step cache decisions using threshold-based heuristics, without directly optimizing for final output quality. As a result, their inference latency va…

报道来源 [5]

MiniPIC: <100行代码实现灵活的与位置无关的缓存

MiniPIC: <100行代码实现灵活的与位置无关的缓存

MiniPIC: <100行代码实现灵活的与位置无关的缓存

预算受限的步进式扩散缓存

预算受限的步进式扩散缓存

相关实体

相关话题