PulseAugur
实时 16:00:17
Română(RO) Caracal: Causal Architecture via Spectral Mixing

Caracal架构使用傅里叶变换实现高效长序列建模

研究人员推出了Caracal,这是一种旨在提高大型语言模型处理长序列可扩展性的新架构。Caracal用参数高效的多头傅里叶模块取代了计算成本高昂的注意力机制,该模块利用了快速傅里叶变换。这种方法通过解决二次成本和位置编码的限制,为长序列建模提供了一条更有效的途径,同时通过标准库运算符保持了可移植性。 AI

影响 为长序列建模提供了更具可扩展性和可移植性的架构,有可能降低计算成本。

排序理由 学术论文介绍了一种新颖的LLM架构。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Caracal架构使用傅里叶变换实现高效长序列建模

报道来源 [2]

  1. arXiv cs.LG TIER_1 Română(RO) · Bingzheng Gan, Tianyi Zhang, Yusu Li, Jing Huang, Wei Shi, Yangkai Ding, Tao Yu ·

    Caracal: Causal Architecture via Spectral Mixing

    arXiv:2605.00292v1 Announce Type: new Abstract: The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attent…

  2. arXiv cs.AI TIER_1 Română(RO) · Tao Yu ·

    Caracal: Causal Architecture via Spectral Mixing

    The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, $\mathcal{O}(L \…