PulseAugur
实时 11:09:08
English(EN) RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention

新系统应对LLM长上下文服务瓶颈

两篇新研究论文介绍了管理KV缓存的新方法,KV缓存是在服务具有长上下文的大型语言模型时的关键瓶颈。RedKnot提出了一种头感知的KV缓存管理系统,该系统根据注意力头的注意力和有效范围对缓存进行分解,从而提高资源效率和可扩展性。TokenMizer将对话历史建模为图结构知识图,通过保留关系结构实现了显著的令牌经济和更高的决策召回率。 AI

影响 这些系统旨在提高LLM服务的效率和可扩展性,可能支持更复杂和更长上下文的应用。

排序理由 两篇学术论文提出了LLM基础设施的新方法。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Yang Liu, ZhaoKai Luo, HuaYi Jin, ZhiYong Wang, RuoZhou He, BoYu Wang, Guanjie Chen, Junhao Hu ·

    RedKnot:利用头感知KV重用和SegPagedAttention实现高效长上下文LLM服务

    arXiv:2606.06256v1 Announce Type: new Abstract: As the input length of large language model (LLM) serving continues to grow, the KV cache has become a dominant bottleneck in AI infrastructure. It limits GPU memory capacity, serving concurrency, cache reuse, and distributed scalab…

  2. arXiv cs.AI TIER_1 English(EN) · Shweta Mishra ·

    TokenMizer:用于长时LLM上下文管理的图结构化会话内存

    arXiv:2606.06337v1 Announce Type: new Abstract: Large language model (LLM) deployments for long-horizon tasks face a fundamental constraint: context windows are finite while productive work sessions are not. When history exceeds the Maximum Effective Context Window (MECW), critic…

  3. arXiv cs.AI TIER_1 English(EN) · Shweta Mishra ·

    TokenMizer:用于长时LLM上下文管理的图结构会话内存

    Large language model (LLM) deployments for long-horizon tasks face a fundamental constraint: context windows are finite while productive work sessions are not. When history exceeds the Maximum Effective Context Window (MECW), critical structured information - architectural decisi…

  4. arXiv cs.AI TIER_1 English(EN) · Junhao Hu ·

    RedKnot:通过头感知KV重用和SegPagedAttention实现高效长上下文LLM服务

    As the input length of large language model (LLM) serving continues to grow, the KV cache has become a dominant bottleneck in AI infrastructure. It limits GPU memory capacity, serving concurrency, cache reuse, and distributed scalability. Several important problems, including pos…