NVIDIA has released two new large language models, Nemotron 3 Nano and Nemotron 3 Ultra, focusing on efficiency and advanced capabilities. Nemotron 3 Nano is a 30B-class model designed for private inference and agentic workflows, featuring a hybrid Mamba-Transformer Mixture-of-Experts architecture and supporting up to 1 million tokens for long-context applications. Nemotron 3 Ultra, a 550B-parameter model, utilizes a similar hybrid architecture with LatentMoE to achieve faster inference speeds than similarly sized models, incorporating native speculative decoding and trained with a novel 4-bit precision format. AI
IMPACT These models offer efficient reasoning and long-context capabilities, potentially lowering the barrier for deploying advanced AI agents and applications.
RANK_REASON NVIDIA is a frontier lab, and the cluster describes new model releases (Nemotron 3 Nano and Nemotron 3 Ultra) with details on their architecture and capabilities. [lever_c_demoted from frontier_release: ic=2 ai=1.0]
- LatentMoE
- Mamba
- mixture of experts
- Multi Token Prediction
- Nemotron 3 Ultra
- NVIDIA
- transformer
- GPT OSS 20B
- Mamba-2
- miniF2F
- Nemotron 3 Nano
- Qwen3–30B-A3B-Thinking-2507
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →