NVIDIA has released NeMo AutoModel, an open library integrated with its NeMo framework, designed to significantly accelerate the fine-tuning of large Mixture-of-Experts (MoE) AI models. This new tool builds upon Hugging Face's Transformers v5 by incorporating advanced features like Expert Parallelism and TransformerEngine kernels. The integration results in up to 3.7x higher training throughput and a 32% reduction in GPU memory usage compared to standard Transformers v5, all while maintaining the familiar `from_pretrained()` API for ease of use. AI
IMPACT Accelerates fine-tuning of large AI models, potentially reducing costs and time for researchers and developers.
RANK_REASON This is a library release that enhances existing frameworks, not a novel model release from a frontier lab.
- Expert Parallelism
- Hugging Face
- NeMo AutoModel
- Nemotron 3 Nano 30B
- NVIDIA
- NVIDIA NeMo
- NVIDIA Nemotron 3 Ultra 550B
- PyTorch
- Qwen3 30B
- SGLang
- TransformerEngine
- Transformers
- vLLM
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →