ThriftAttention boosts AI long-context efficiency with selective precision

By PulseAugur Editorial · [2 sources] · 2026-05-25 04:00

Researchers have developed ThriftAttention, a novel method to improve the efficiency of long-context attention mechanisms in AI models. This technique selectively uses higher precision (FP16) for critical query-key interactions while performing the majority of computations at a lower, more efficient precision (FP4). By focusing FP16 precision on only about 5% of the most important blocks, ThriftAttention significantly reduces the quality degradation typically seen with low-bit precision in long-context scenarios, recovering nearly 90% of the performance gap compared to full FP16. AI

IMPACT Enhances efficiency for long-context AI models, potentially lowering inference costs and enabling broader application of models with extensive memory.

RANK_REASON The cluster contains an academic paper detailing a new method for improving AI model efficiency. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
infra

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.LG TIER_1 English(EN) · Joe Sharratt · 2026-05-25 04:00

ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

arXiv:2605.23081v1 Announce Type: new Abstract: Efficient attention algorithms are critical to mitigate the quadratic cost of attention in long-context workloads. Prior work utilises block-scaled quantisation techniques on Blackwell GPUs to move attention computation to 4-bit pre…
r/LocalLLaMA TIER_1 English(EN) · /u/miserlou · 2026-05-25 21:14

ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

  submitted by   <a href="https://www.reddit.com/user/miserlou"> /u/miserlou </a> <br /> <span><a href="https://arxiv.org/pdf/2605.23081">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1tnm56l/thriftattention_selective_mixed_precision_…

COVERAGE [2]

ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

RELATED ENTITIES

RELATED TOPICS