PulseAugur
EN
LIVE 05:44:46
ENTITY GQA

GQA

PulseAugur coverage of GQA — every cluster mentioning GQA across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
13
13 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
12
12 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 13 TOTAL
  1. RESEARCH · CL_111257 ·

    PersistentKV optimizes LLM serving on commodity GPUs with new scheduling techniques

    A new paper introduces PersistentKV, a system designed to optimize the serving of large language models (LLMs) with long contexts on commodity GPUs. PersistentKV employs page-aware decode scheduling and a native block-t…

  2. TOOL · CL_105112 ·

    Kamera method enhances multimodal AI efficiency with position-invariant KV cache

    Researchers have developed a new method called Kamera that addresses the inefficiency of multimodal AI agents re-encoding information from repeated video frames or UI screenshots. This technique introduces a training-fr…

  3. RESEARCH · CL_105983 ·

    Grouped Query Experts enhance Transformer efficiency by selectively activating query heads

    Researchers have introduced Grouped Query Experts (GQE), a novel mixture-of-experts layer designed to enhance the efficiency of Transformer models, particularly at long context lengths. GQE builds upon Grouped-Query Att…

  4. TOOL · CL_95483 ·

    xFormers library enables memory-efficient Transformer models on GPUs

    This tutorial demonstrates how to build memory-efficient Transformer models using the xFormers library on GPUs. It covers implementing and comparing memory-efficient attention with standard attention, analyzing techniqu…

  5. RESEARCH · CL_82210 ·

    Kwai releases Keye-VL-2.0 for long-video understanding

    Kwai has released Keye-VL-2.0-30B-A3B, an open-source multimodal foundation model designed for long-video understanding and agentic intelligence. This model utilizes DeepSeek Sparse Attention to process up to 256K conte…

  6. TOOL · CL_56286 ·

    New GQLA Attention Optimizes LLMs for Diverse Hardware

    Researchers have developed Group-Query Latent Attention (GQLA), a novel attention mechanism designed to optimize large language model decoding across diverse hardware. GQLA offers two algebraically equivalent decoding p…

  7. TOOL · CL_43642 ·

    OpenMythos tutorial shows recurrent transformers for deeper computation

    The OpenMythos framework enables the construction of advanced recurrent-depth transformer models, demonstrated through a tutorial using Google Colab. This tutorial showcases building and comparing Multi-Latent Attention…

  8. TOOL · CL_26875 ·

    Transformer LLM Architectures Converge on Standard Stack

    A recent analysis of 53 large language models from 2017 to 2025 reveals a significant convergence in transformer architectures. Key elements of this de facto standard include pre-normalization (RMSNorm), Rotary Position…

  9. RESEARCH · CL_09211 ·

    IBM releases Granite 4.1 LLMs with 512K context and Apache 2.0 license

    IBM has released the Granite 4.1 family of large language models, comprising 3B, 8B, and 30B parameter versions. These models were trained on approximately 15 trillion tokens through a five-stage pre-training process th…

  10. RESEARCH · CL_08619 ·

    BLASST paper introduces dynamic sparse attention for faster LLM inference

    Researchers have developed BLASST, a novel sparse attention mechanism designed to accelerate inference for large language models with long contexts. This drop-in solution dynamically skips attention blocks using a simpl…

  11. RESEARCH · CL_06270 ·

    Kwai Summary Attention compresses historical contexts for efficient long-context LLMs

    Researchers have introduced Kwai Summary Attention (KSA), a novel attention mechanism designed to address the quadratic time complexity of standard softmax attention in large language models. KSA aims to maintain a line…

  12. RESEARCH · CL_04553 ·

    DeepSeek benchmarks MLA vs GQA on A100, revealing bandwidth-quality tradeoff

    A technical analysis explores DeepSeek's decision to utilize MLA (Multi-Head Linear Attention) over GQA (Grouped-Query Attention) in their models. The author highlights this choice as a strategic trade-off between compu…

  13. RESEARCH · CL_03769 ·

    DeepSeek-V4, LoRA, and other LLM techniques detailed in new blogs

    A series of six blog posts has been published on Outcome School, detailing fundamental components of contemporary large language models. The posts cover technical concepts such as RMSNorm, DeepSeek-V4, LoRA, RoPE, GQA, …