NVIDIA Apex tutorial optimizes Transformer training with fused kernels

By PulseAugur Editorial · [1 sources] · 2026-06-02 01:39

This tutorial demonstrates how to optimize Transformer training speed using NVIDIA Apex, focusing on its fused kernels like FusedAdam and FusedLayerNorm. It guides users through setting up Apex from source with necessary CUDA extensions to ensure high-performance kernels are available, rather than relying on a limited Python-only installation. The guide includes benchmarking FusedAdam against PyTorch's AdamW and comparing Apex's normalization layers with standard ones, ultimately assessing the throughput improvements in a Transformer training experiment. AI

IMPACT Optimizes existing AI training workflows, potentially reducing compute costs and accelerating development cycles.

RANK_REASON Tutorial on using existing software components for optimization.

Read on MarkTechPost →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

MarkTechPost TIER_1 English(EN) · Sana Hassan · 2026-06-02 01:39

How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

<p>We build NVIDIA Apex from source, detect fused kernels, and benchmark FusedAdam, FusedLayerNorm, and torch.amp in Transformer training.</p> <p>The post <a href="https://www.marktechpost.com/2026/06/01/how-to-speed-up-transformer-training-using-nvidia-apex-fusedadam-fusedlayern…

COVERAGE [1]

How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

RELATED ENTITIES

RELATED TOPICS