English(EN) ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

ThriftAttention 通过择优混合精度注意力机制提升AI效率

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-21 00:00

研究人员开发了ThriftAttention，一种用于提高AI模型长上下文注意力机制效率的新颖方法。该技术选择性地将更高精度（FP16）应用于一小部分关键的查询-键交互，而其余部分则以较低精度（FP4）进行处理。这种择优方法旨在以FP4推理速度维持接近FP16的质量，从而缓解在长上下文设置中低精度常带来的显著质量下降。该方法已证明其能够恢复FP4和FP16注意力之间性能差距的很大一部分，并且随着序列长度的增长，收益也随之增加。 AI

影响 ThriftAttention 提供了一种在不造成显著质量损失的情况下，大幅降低长上下文AI模型推理成本的途径。

排序理由研究论文，详细介绍了一种提高AI模型效率的新方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.LG TIER_1 English(EN) · Joe Sharratt · 2026-05-25 04:00

ThriftAttention：长上下文FP4注意力机制的选择性混合精度

arXiv:2605.23081v1 Announce Type: new Abstract: Efficient attention algorithms are critical to mitigate the quadratic cost of attention in long-context workloads. Prior work utilises block-scaled quantisation techniques on Blackwell GPUs to move attention computation to 4-bit pre…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-21 00:00

ThriftAttention：长上下文 FP4 注意力的选择性混合精度

ThriftAttention reduces long-context attention computation by selectively applying higher precision to critical query-key interactions, achieving near-full precision quality at reduced bitwidth efficiency.
r/LocalLLaMA TIER_1 English(EN) · /u/miserlou · 2026-05-25 21:14

ThriftAttention：长上下文FP4注意力机制的选择性混合精度

  submitted by   <a href="https://www.reddit.com/user/miserlou"> /u/miserlou </a> <br /> <span><a href="https://arxiv.org/pdf/2605.23081">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1tnm56l/thriftattention_selective_mixed_precision_…

报道来源 [3]

ThriftAttention：长上下文FP4注意力机制的选择性混合精度

ThriftAttention：长上下文 FP4 注意力的选择性混合精度

ThriftAttention：长上下文FP4注意力机制的选择性混合精度

相关实体

相关话题