学生开发 Silia，一种参数高效的 Transformer 架构

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-11 05:05

一名学生研究员开发了一种名为 Silia 的新型 Transformer 架构，旨在以低于 1000 万的严格参数预算进行高效建模和分类任务。该架构旨在将注意力机制的动态混合与前馈网络的强大非线性结合在一个统一的操作中。实验表明，Silia 在参数数量显著减少的情况下，取得了与 GPT-2 相当的性能，尽管有限的硬件和计算预算限制了测试范围。 AI

影响为在资源受限的环境中创建更参数高效的 Transformer 模型引入了一种潜在方法。

排序理由该集群包含一篇详细介绍新型模型架构的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/SrijSriv211 · 2026-06-11 05:05

Tiny Scale Is All I Can Spare To Play With Transformer

<div class="md">Hi! I am a student from India, this is my first paper that I published. I was curious whether I can combine both Attention and FFN together to save parameters without sacrificing performance, specifically at parameters <= 10M. Ba…

报道来源 [1]

Tiny Scale Is All I Can Spare To Play With Transformer

相关实体

相关话题