PulseAugur
实时 11:01:04

新的缓存技术提升LLM和扩散模型效率

研究人员开发了MiniPIC,一种用于大型语言模型推理的高效缓存新方法,只需对vLLM等现有系统进行少于100行的代码更改。该方法将预填充吞吐量提高了49%,并显著降低了缓存跨度的延迟。此外,还为扩散模型引入了一种名为BudCache的新技术,该技术根据固定的计算预算优化缓存策略,以保持输出质量,在FLUX.1-dev和Wan2.1上表现优于启发式方法。 AI

影响 这些缓存创新有望降低大型语言模型和扩散模型的推理成本并提高其速度。

排序理由 该集群包含两篇详细介绍AI模型新缓存技术的独立研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Nathan Ordonez (IBM Research), Thomas Parnell (IBM Research) ·

    MiniPIC: Flexible Position-Independent Caching in <100LOC

    arXiv:2606.13126v1 Announce Type: cross Abstract: Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV entri…

  2. arXiv cs.CL TIER_1 English(EN) · Thomas Parnell ·

    MiniPIC: Flexible Position-Independent Caching in <100LOC

    Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV entries unless they share identical prefixes with anoth…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    MiniPIC: Flexible Position-Independent Caching in <100LOC

    Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV entries unless they share identical prefixes with anoth…

  4. arXiv cs.CV TIER_1 English(EN) · Mingkun Lei, Tong Zhao, Liangyu Yuan, Chi Zhang ·

    Budget-Constrained Step-Level Diffusion Caching

    arXiv:2606.13496v1 Announce Type: new Abstract: Step-level caching accelerates diffusion models by exploiting temporal redundancy across denoising steps. Existing methods make per-step cache decisions using threshold-based heuristics, without directly optimizing for final output …

  5. arXiv cs.CV TIER_1 English(EN) · Chi Zhang ·

    Budget-Constrained Step-Level Diffusion Caching

    Step-level caching accelerates diffusion models by exploiting temporal redundancy across denoising steps. Existing methods make per-step cache decisions using threshold-based heuristics, without directly optimizing for final output quality. As a result, their inference latency va…