English(EN) Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

Sakana AI、NVIDIA 发布 TwELL，加速 LLM 训练和推理

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-11 08:36

Sakana AI 和 NVIDIA 的研究人员开发了 TwELL，这是一种显著加速大型语言模型 (LLM) 操作的新方法。通过针对计算密集型的前馈层，TwELL 实现了高稀疏性，并在 GPU 上转化为实际性能提升。该方法在不影响模型准确性的情况下，训练速度最高提升 21.9%，推理速度最高提升 20.5%。 AI

影响加速 LLM 训练和推理，可能降低 AI 开发的成本并提高可及性。

排序理由介绍 LLM 新技术及相关加速的研究论文。

在 MarkTechPost 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

MarkTechPost TIER_1 English(EN) · Asif Razzaq · 2026-05-11 08:36

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

<p>Sakana AI and NVIDIA Researchers demonstrate that simple L1 regularization can induce over 99% sparsity in feedforward layers with negligible downstream performance impact, and translate that sparsity into real GPU throughput gains using new sparse data formats and fused CUDA …
Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-05-11 08:51

Sakana AI and NVIDIA have introduced TwELL, a new approach using CUDA kernels that achieves 20.5% inference and 21.9% training speedup in large language models.

Sakana AI and NVIDIA have introduced TwELL, a new approach using CUDA kernels that achieves 20.5% inference and 21.9% training speedup in large language models. The technique targets feedforward layers, which account for over two-thirds of model parameters and 80% of FLOPs, by in…

链接 marktechpost.com/…/sakana-ai-and-nvidia-i…

报道来源 [2]

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

Sakana AI and NVIDIA have introduced TwELL, a new approach using CUDA kernels that achieves 20.5% inference and 21.9% training speedup in large language models.

相关实体

相关话题