PulseAugur
实时 08:10:57

Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a linear relationship between the KV cache and sequence length by compressing historical contexts into learnable summary tokens. This approach seeks to balance memory costs with effective retention of long-distance dependencies, offering an alternative to existing methods that either reduce KV cache or use KV cache-friendly architectures. AI

影响 Introduces a new attention mechanism to reduce computational costs for long-context LLMs.

排序理由 Academic paper introducing a novel attention mechanism for LLMs.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Chenglong Chu, Guorui Zhou, Guowang Zhang, Han Li, Hao Peng, Hongtao Cheng, Jian Liang, Jiangxia Cao, Kun Gai, Lingzhi Zhou, Lu Ren, Qi Zhang, Ruiming Tang, Ruitao Wang, Xinchen Luo, Yi Su, Zhiyuan Liang, Ziqi Wang, Boyang Ding, Chengru Song, Dunju Zang, ·

    Kwai Summary Attention Technical Report

    arXiv:2604.24432v1 Announce Type: new Abstract: Long-context ability, has become one of the most important iteration direction of next-generation Large Language Models, particularly in semantic understanding/reasoning, code agentic intelligence and recommendation system. However,…

  2. arXiv cs.CL TIER_1 English(EN) · Zixing Zhang ·

    Kwai Summary Attention Technical Report

    Long-context ability, has become one of the most important iteration direction of next-generation Large Language Models, particularly in semantic understanding/reasoning, code agentic intelligence and recommendation system. However, the standard softmax attention exhibits quadrat…