Chiaroscuro Attention 通过动态令牌路由优化 Transformer 计算

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-06 00:00

研究人员开发了 CHIAR-Former，这是一种新颖的 4 层 Transformer 模型，通过动态路由令牌来优化计算使用。CHIAR-Former 不会统一应用自注意力，而是分析令牌的频谱熵，将每个令牌导向三个算子之一：DCT 频谱混合、RBF 核混合或全自注意力。这种方法在大型自然语言文本上显著提高了性能，在 WikiText-103 上实现了 45% 的困惑度改进，同时注意力 FLOPs 比标准 Transformer 减少了 62.5%。 AI

影响引入了一种显著降低 Transformer 在大型文本数据集上计算成本的方法。

排序理由该集群包含一篇详细介绍新颖模型架构及其性能评估的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

Chiaroscuro Attention 通过动态令牌路由优化 Transformer 计算

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Prateek Kumar Sikdar · 2026-06-09 04:00

Chiaroscuro Attention：在黑暗中消耗计算资源

arXiv:2606.08327v1 Announce Type: cross Abstract: Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer…
arXiv cs.AI TIER_1 English(EN) · Prateek Kumar Sikdar · 2026-06-06 20:38

Chiaroscuro Attention：在黑暗中花费计算资源

Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-06 00:00

Chiaroscuro Attention：在黑暗中花费计算资源

CHIAR-Former uses spectral entropy-based routing to dynamically select between DCT, RBF, and self-attention operators, achieving improved efficiency on large text datasets while maintaining performance through hybrid attention mechanisms.

报道来源 [3]

Chiaroscuro Attention：在黑暗中消耗计算资源

Chiaroscuro Attention：在黑暗中花费计算资源

Chiaroscuro Attention：在黑暗中花费计算资源

相关实体

相关话题