English(EN) HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

新的 HERMES 和 DSCache 方法通过 KV 缓存改进流式视频理解

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-05 04:00

研究人员开发了新的方法来提高多模态大型语言模型 (MLLM) 理解流式视频的效率。一种方法 HERMES 将 KV 缓存概念化为一个分层内存系统，从而以更少的内存使用量实现更快的处理和更高的准确性。另一种方法 DSCache 将过去和现在的 KV 缓存解耦，并使用位置无关编码来处理无界流，并泛化到比模型训练时更长的序列。 AI

影响新的 KV 缓存管理技术可以显著提高 LLM 的实时视频分析能力。

排序理由两篇 arXiv 论文介绍了使用 KV 缓存机制实现高效流式视频理解的新型架构。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Yiwei Wang · 2026-05-08 15:40

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

Online streaming video understanding requires models to process continuous visual inputs and respond to user queries in real time, where the unbounded stream and unpredictable query timing turn memory management into a central challenge. Existing methods typically compress visual…
arXiv cs.CL TIER_1 English(EN) · Haowei Zhang, Shudong Yang, Jinlan Fu, See-Kiong Ng, Xipeng Qiu · 2026-05-08 04:00

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

arXiv:2601.14724v4 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging…
arXiv cs.CV TIER_1 English(EN) · Zhanzhong Pang, Dibyadip Chatterjee, Fadime Sener, Angela Yao · 2026-05-05 04:00

Decouple and Cache: KV Cache Construction for Streaming Video Understanding

arXiv:2605.01858v1 Announce Type: new Abstract: Streaming video understanding requires processing unbounded video streams with limited memory and computation, posing two key challenges. First, continuously constructing new and evicting old key-value(KV) caches is required for unb…

报道来源 [3]

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Decouple and Cache: KV Cache Construction for Streaming Video Understanding

相关实体

相关话题