PulseAugur
实时 11:51:05
English(EN) End-to-End Context Compression at Scale

新型LCLM高效压缩长上下文语言模型

研究人员开发了潜在上下文语言模型(LCLMs),这是一类新型的编码器-解码器压缩器,旨在解决长上下文语言模型推理中的内存瓶颈问题。通过广泛的架构搜索和在超过3500亿个token上的预训练,这些模型实现了1:4、1:8和1:16的压缩比。LCLMs通过提高通用任务性能、压缩速度和减少峰值内存使用量,优于现有方法,使其成为长视野代理的高效骨干。 AI

影响 引入了一种高效长上下文处理的新方法,有望实现更强大、内存占用更少的AI代理。

排序理由 这是一篇详细介绍新模型架构及其性能的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Ang Li, Sean McLeish, Haozhe Chen, Nimit Kalra, Zaiqian Chen, Artem Gazizov, Venkata Anoop Suhas Kumar Morisetty, Bhavya Kailkhura, Harshitha Menon, Zhuang Liu, Brian R. Bartoldson, Tom Goldstein, Sanae Lotfi, Micah Goldblum, Pavel Izmailov ·

    大规模端到端上下文压缩

    arXiv:2606.09659v1 Announce Type: cross Abstract: Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require consider…

  2. arXiv cs.AI TIER_1 English(EN) · Pavel Izmailov ·

    大规模端到端上下文压缩

    Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long pr…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    End-to-End Context Compression at Scale

    Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long pr…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    大规模端到端上下文压缩

    Encoder-decoder compression techniques are improved through architectural search and large-scale pretraining to create Latent Context Language Models that efficiently handle long contexts with better performance and memory usage compared to traditional KV cache methods.

  5. r/LocalLLaMA TIER_1 English(EN) · /u/DeltaSqueezer ·

    LLM context compression at 16x beats KV cache

    &#32; submitted by &#32; <a href="https://www.reddit.com/user/DeltaSqueezer"> /u/DeltaSqueezer </a> <br /> <span><a href="https://venturebeat.com/data/context-compression-finally-works-in-production-new-research-cuts-llm-input-16x-without-the-accuracy-hit">[link]</a></span> &#32;…