PulseAugur
EN
LIVE 04:03:59
ENTITY KV cache

KV cache

PulseAugur coverage of KV cache — every cluster mentioning KV cache across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
59
59 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
41
41 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

16 day(s) with sentiment data

RECENT · PAGE 1/3 · 59 TOTAL
  1. RESEARCH · CL_109420 ·

    Engram pioneers AI 'memory' by baking knowledge into weights, not just context

    AI startup Engram is developing a novel approach to AI memory and continual learning, aiming to embed specialized knowledge directly into model weights rather than relying solely on retrieval-augmented generation (RAG) …

  2. RESEARCH · CL_108502 ·

    New EpiKV method optimizes LLM KV cache, boosting efficiency and context length

    A new research paper introduces EpiKV, a method for optimizing KV cache eviction in large language models. Unlike previous methods that rely on attention weights, EpiKV uses an "epiphany score" derived from changes in t…

  3. RESEARCH · CL_109581 ·

    ASAP framework enhances ML hyperparameter optimization via agent-system co-design

    Researchers have developed ASAP, a novel agent-system co-design framework for hyperparameter optimization (HPO) in machine learning experiments. ASAP addresses limitations of existing HPO tools by integrating a diverse …

  4. RESEARCH · CL_107863 ·

    Nexus Sampling improves LLM KV cache eviction, reducing memory use

    Researchers have developed Nexus Sampling, a novel method for managing KV cache eviction in large language models, particularly for long-context and agentic workloads. This training-free approach pairs Nexus scoring wit…

  5. TOOL · CL_105112 ·

    Kamera method enhances multimodal AI efficiency with position-invariant KV cache

    Researchers have developed a new method called Kamera that addresses the inefficiency of multimodal AI agents re-encoding information from repeated video frames or UI screenshots. This technique introduces a training-fr…

  6. RESEARCH · CL_106564 ·

    New methods enhance LLM efficiency via KV cache compression and quantization

    Researchers have developed new methods to improve the efficiency of large language models (LLMs) by compressing their key-value (KV) caches. One approach, InfoKV, uses information-theoretic signals like predictive uncer…

  7. TOOL · CL_104774 ·

    Keyless Attention mechanism halves KV cache and boosts transformer efficiency

    Researchers have introduced Keyless Attention, a novel attention mechanism for transformers that eliminates the key projection entirely, operating solely on queries and values. This approach results in a Value-Only Cach…

  8. TOOL · CL_106135 ·

    KV cache memory problem plagues LLM serving, vLLM's PagedAttention offers solution

    The KV cache is a critical component in LLM inference, storing past computations to avoid recomputing them for each new token. However, its memory footprint can become a significant bottleneck, especially in production …

  9. FRONTIER RELEASE · CL_103597 ·

    Baidu releases Unlimited OCR with constant KV cache for long documents

    Baidu has released Unlimited OCR, a 3-billion-parameter Mixture-of-Experts model designed for efficient long-document parsing. The model utilizes Reference Sliding Window Attention (R-SWA) to maintain a constant KV cach…

  10. TOOL · CL_99437 ·

    AWS SageMaker enhances AI inference monitoring with CloudWatch dashboard

    Amazon SageMaker has enhanced its monitoring capabilities for generative AI inference endpoints by integrating detailed metrics and a new Insights dashboard within Amazon CloudWatch. This upgrade allows users to more ef…

  11. RESEARCH · CL_99962 ·

    New 'Execution-State Capsules' Speed Up On-Device AI Serving

    Researchers have introduced "execution-state capsules," a novel method for managing and reusing the complete state of AI models during on-device serving. This approach allows for rapid checkpointing and restoration of a…

  12. TOOL · CL_96117 ·

    New research enables editable and composable KV cache for LLMs

    A new research paper introduces a novel method for optimizing KV cache usage in large language models, enabling editable and composable notes within the prefill stage. This approach allows for efficient editing of model…

  13. RESEARCH · CL_93469 ·

    New methods boost LLM inference speed via speculative decoding · 7 sources tracked

    Researchers are developing advanced speculative decoding techniques to accelerate large language model (LLM) inference. JetFlow, a new framework, improves speed by combining drafting efficiency with causal conditioning,…

  14. TOOL · CL_93124 ·

    CogGuard framework offers proactive warnings for edge AI services

    Researchers have developed CogGuard, a new framework designed to provide proactive warnings for edge intelligent services. This system aims to predict task completion success while adhering to strict latency and privacy…

  15. RESEARCH · CL_95768 ·

    Variable-Width Transformers Offer Improved Efficiency in Language Models

    Researchers have proposed a novel transformer architecture, termed the '> <former' or 'x-shaped' architecture, that deviates from the standard uniform width across all layers. This new design allocates wider capacity to…

  16. TOOL · CL_92700 ·

    LLM Architectures Prioritize Long-Context Efficiency

    New large language model architectures are focusing on improving efficiency with long contexts. Recent open-weight model releases are implementing architectural modifications to decrease the size of the KV cache, which …

  17. RESEARCH · CL_93573 ·

    KVEraser offers efficient KV cache editing for LLMs

    Researchers have developed KVEraser, a novel method for efficiently erasing specific information from the KV cache of large language models. This technique addresses the challenge of localized context editing, where rem…

  18. RESEARCH · CL_93251 ·

    New LLM KV Cache Compression Methods Tackle Safety and Efficiency

    Researchers are developing new methods to compress the Key-Value (KV) cache in large language models (LLMs) to reduce memory usage and improve inference efficiency. AnchorKV focuses on safety by biasing token retention …

  19. RESEARCH · CL_86566 ·

    AI agents could buy precomputed KV caches to save compute

    Researchers propose a novel method to reduce AI agent computation by precomputing and selling Key-Value (KV) caches for documents. This approach aims to eliminate redundant prefill computations, which are the most compu…

  20. COMMENTARY · CL_71784 ·

    Qwen 3.6 35B model excels with KV cache in agentic tasks

    A user on r/LocalLLaMA found that the Qwen 3.6 35B model significantly outperforms the 27B version, particularly in agentic tasks, when using KV cache. This user initially favored the 27B model for its perceived intelli…