PulseAugur
实时 11:25:35

新方法提升AI在视频和LLM中的注意力效率

研究人员开发了几种新方法来提高AI模型中注意力机制的效率。一种名为SimInsert的方法,通过将单帧编辑与时间传播解耦,专注于无缝视频对象插入。另一组技术,包括PBS-Attn和RetroAttention,旨在通过降低计算复杂性和提高KV缓存效率来优化处理长上下文的大型语言模型(LLMs)的注意力。此外,DFSAttn和RTPurbo提供了实现稀疏注意力的创新方法,无论是通过视频生成的动态细粒度稀疏化,还是通过最少量的训练将现有的全注意力模型转换为稀疏模型。 AI

影响 注意力机制的这些进步可能带来更高效、更强大的AI模型,适用于从视频编辑到长上下文语言处理的各种任务。

排序理由 多篇研究论文介绍了AI注意力机制的新技术。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 15 个来源。 我们如何撰写摘要 →

报道来源 [15]

  1. arXiv cs.AI TIER_1 Deutsch(DE) · Aditya Desai, Kumar Krishna Agrawal, Shuo Yang, Alejandro Cuadron, Luis Gaspar Schroeder, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica ·

    vAttention: Verified Sparse Attention

    arXiv:2510.05688v2 Announce Type: replace-cross Abstract: State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based estimation. However, these appr…

  2. arXiv cs.LG TIER_1 English(EN) · Weikang Meng, Yadan Luo, Liangyu Huo, Yingjian Li, Yaowei Wang, Xin Li, Zheng Zhang ·

    Norm$\times$Direction: Restoring the Missing Query Norm in Vision Linear Attention

    arXiv:2506.21137v3 Announce Type: replace Abstract: Linear attention mitigates the quadratic complexity of softmax attention but suffers from a critical loss of expressiveness. We identify two primary causes: (1) The normalization operation cancels the query norm, which breaks th…

  3. arXiv cs.AI TIER_1 English(EN) · Xinghao Wang, Pengyu Wang, Xiaoran Liu, Fangxu Liu, Jason Chu, Kai Song, Xipeng Qiu ·

    Prism: Spectral-Aware Block-Sparse Attention

    arXiv:2602.08426v2 Announce Type: replace-cross Abstract: Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bottleneck. Existing methods typically employ coarse-grained attention as a proxy for…

  4. arXiv cs.AI TIER_1 English(EN) · Florent Draye, Anson Lei, Hsiao-Ru Pan, Ingmar Posner, Bernhard Sch\"olkopf ·

    Intrinsically Interpretable Attention via Sparse Post-Training

    arXiv:2512.05865v5 Announce Type: replace-cross Abstract: We introduce a simple post-training method that makes transformer attention sparse without sacrificing performance. Applying a flexible sparsity regularisation under a constrained-loss objective, we show on models up to 7B…

  5. arXiv cs.AI TIER_1 English(EN) · Spandan Pratyush ·

    Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers

    arXiv:2605.24518v1 Announce Type: cross Abstract: The quadratic complexity of self-attention in Transformer models remains a significant bottleneck for processing long sequences and deploying large language models efficiently. For this approach, there has been significant researc…

  6. arXiv cs.AI TIER_1 English(EN) · Xinghao Wang, Pengyu Wang, Dong Zhang, Chenkun Tan, Shaojun Zhou, Zhaoxiang Liu, Shiguo Lian, Fangxu Liu, Kai Song, Xipeng Qiu ·

    Sparser Block-Sparse Attention via Token Permutation

    arXiv:2510.21270v2 Announce Type: replace-cross Abstract: Scaling the context length of large language models (LLMs) offers significant benefits but is computationally expensive. This expense stems primarily from the self-attention mechanism, whose $O(N^2)$ complexity with respec…

  7. arXiv cs.AI TIER_1 English(EN) · Xinyu Chen, Yuyi Qian, Jiang Lin, Shenyi Wang, Gao Wang, Zhiqiu Zhang, Jizhi Zhang, Mingjie Wang, Qiang Tang, Qian Wang, Song Wu, Zili Yi ·

    SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

    arXiv:2605.23245v1 Announce Type: cross Abstract: Video object insertion requires ensuring spatio-temporal coherence and interactive realism, extending far beyond simple content placement. However, current approaches are often hindered by a reliance on explicit motion engineering…

  8. arXiv cs.AI TIER_1 English(EN) · Seonghwan Choi, Beomseok Kang, Dongwon Jo, Jae-Joon Kim ·

    Retrospective Sparse Attention for Efficient Long-Context Generation

    arXiv:2508.09001v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in long-context tasks such as reasoning, code generation, and multi-turn dialogue. However, inference over extended contexts is bottlenecked by the Key-Value (KV) cach…

  9. Hugging Face Daily Papers TIER_1 English(EN) ·

    Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

    RTPurbo leverages intrinsic sparsity in full-attention LLMs to achieve efficient long-context inference with minimal training overhead, enabling significant speedups while maintaining near-lossless accuracy.

  10. arXiv cs.CV TIER_1 English(EN) · Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang ·

    Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

    arXiv:2602.04789v2 Announce Type: replace Abstract: Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attenti…

  11. arXiv cs.CV TIER_1 English(EN) · Rishabh Sabharwal, Ram Samarth B B, Parikshit Singh Rathore, Punit Rathore ·

    STEAM: Squeeze and Transform Enhanced Attention Module

    arXiv:2412.09023v2 Announce Type: replace Abstract: Channel and spatial attention mechanisms introduced by earlier works enhance the representation abilities of deep convolutional neural networks (CNNs) but often lead to increased parameter and computation costs. While recent app…

  12. arXiv cs.CV TIER_1 English(EN) · Jie Hu, Zixiang Gao, Yutong He, Kun Yuan ·

    DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

    arXiv:2605.23445v1 Announce Type: new Abstract: Diffusion transformers have achieved remarkable success in high-quality video generation, yet their reliance on spatiotemporal 3D full attention incurs prohibitive computational cost due to the quadratic complexity of attention. Blo…

  13. arXiv cs.CV TIER_1 English(EN) · Kun Yuan ·

    DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

    Diffusion transformers have achieved remarkable success in high-quality video generation, yet their reliance on spatiotemporal 3D full attention incurs prohibitive computational cost due to the quadratic complexity of attention. Block sparse attention is a common approach to miti…

  14. arXiv cs.CV TIER_1 English(EN) · Zili Yi ·

    SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

    Video object insertion requires ensuring spatio-temporal coherence and interactive realism, extending far beyond simple content placement. However, current approaches are often hindered by a reliance on explicit motion engineering or resource-intensive retraining, restricting the…

  15. r/LocalLLaMA TIER_1 English(EN) · /u/pmttyji ·

    Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

    <!-- SC_OFF --><div class="md"><blockquote> <p>Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse training or on heuristic token eviction, creating an undesira…