Română(RO) Caracal: Causal Architecture via Spectral Mixing

Caracal架构使用傅里叶变换实现高效长序列建模

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-30 23:31

研究人员推出了Caracal，这是一种旨在提高大型语言模型处理长序列可扩展性的新架构。Caracal用参数高效的多头傅里叶模块取代了计算成本高昂的注意力机制，该模块利用了快速傅里叶变换。这种方法通过解决二次成本和位置编码的限制，为长序列建模提供了一条更有效的途径，同时通过标准库运算符保持了可移植性。 AI

影响为长序列建模提供了更具可扩展性和可移植性的架构，有可能降低计算成本。

排序理由学术论文介绍了一种新颖的LLM架构。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 Română(RO) · Bingzheng Gan, Tianyi Zhang, Yusu Li, Jing Huang, Wei Shi, Yangkai Ding, Tao Yu · 2026-05-04 04:00

Caracal：通过频谱混合实现因果架构

arXiv:2605.00292v1 Announce Type: new Abstract: The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attent…
arXiv cs.AI TIER_1 Română(RO) · Tao Yu · 2026-04-30 23:31

Caracal：通过频谱混合实现因果架构

The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, $\mathcal{O}(L \…

报道来源 [2]

Caracal：通过频谱混合实现因果架构

Caracal：通过频谱混合实现因果架构

相关实体

相关话题