PulseAugur / Brief
EN
LIVE 03:35:12

Brief

last 24h
[1/1] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

    Researchers have developed ThriftAttention, a novel method to improve the efficiency of long-context attention mechanisms in AI models. This technique selectively uses higher precision (FP16) for critical query-key interactions while performing the majority of computations at a lower, more efficient precision (FP4). By focusing FP16 precision on only about 5% of the most important blocks, ThriftAttention significantly reduces the quality degradation typically seen with low-bit precision in long-context scenarios, recovering nearly 90% of the performance gap compared to full FP16. AI

    IMPACT Enhances efficiency for long-context AI models, potentially lowering inference costs and enabling broader application of models with extensive memory.