PulseAugur
LIVE 01:02:50
research · [2 sources] ·
1
research

Sakana AI, NVIDIA unveil TwELL for faster LLM training and inference

Researchers from Sakana AI and NVIDIA have developed TwELL, a novel method that significantly speeds up large language model (LLM) operations. By targeting the feedforward layers, which are computationally intensive, TwELL induces high sparsity and translates this into practical performance gains on GPUs. This approach achieves up to a 21.9% speedup in training and a 20.5% speedup in inference without compromising model accuracy. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Accelerates LLM training and inference, potentially lowering costs and increasing accessibility for AI development.

RANK_REASON Research paper introducing a new technique and associated speedups for LLMs.

Read on MarkTechPost →

Sakana AI, NVIDIA unveil TwELL for faster LLM training and inference

COVERAGE [2]

  1. MarkTechPost TIER_1 · Asif Razzaq ·

    Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

    <p>Sakana AI and NVIDIA Researchers demonstrate that simple L1 regularization can induce over 99% sparsity in feedforward layers with negligible downstream performance impact, and translate that sparsity into real GPU throughput gains using new sparse data formats and fused CUDA …

  2. Mastodon — mastodon.social TIER_1 · [email protected] ·

    Sakana AI and NVIDIA have introduced TwELL, a new approach using CUDA kernels that achieves 20.5% inference and 21.9% training speedup in large language models.

    Sakana AI and NVIDIA have introduced TwELL, a new approach using CUDA kernels that achieves 20.5% inference and 21.9% training speedup in large language models. The technique targets feedforward layers, which account for over two-thirds of model parameters and 80% of FLOPs, by in…