English(EN) Prism Transformer: Progressive Head Schedules for Hierarchical Attention Processing

Prism Transformer 引入分层注意力的渐进式头调度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-29 04:00

研究人员推出 Prism Transformer，这是一种新颖的架构，可修改标准的多头注意力机制。Prism Transformer 不在每个层中为每个注意力头分配相等的维度空间，而是跨层渐进地增加头的数量。这种方法建立了从局部到全局的表示层次结构，使早期层能够用更宽的头捕获复杂的局部模式，而更深的层则用更窄的头进行专业化。该架构是参数中性的，不会引入额外的训练或推理开销，但在下游零样本基准测试中始终优于统一基线。 AI

影响这种架构修改可能导致模型容量的更有效利用，并在不增加计算成本的情况下提高下游任务的性能。

排序理由该集群包含一篇详细介绍新颖 Transformer 架构的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Shubham Aggarwal · 2026-06-29 04:00

Prism Transformer: Progressive Head Schedules for Hierarchical Attention Processing

arXiv:2606.27449v1 Announce Type: new Abstract: Multi-head attention conventionally partitions the hidden dimension equally across all heads at every layer, enforcing an identical representational subspace dimension (dh = dmodel/h) throughout the models depth. In this work, we id…

报道来源 [1]

Prism Transformer: Progressive Head Schedules for Hierarchical Attention Processing

相关实体

相关话题