Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Researchers have introduced Nemotron 3 Ultra, a 550 billion parameter language model that utilizes a hybrid Mamba-Transformer architecture with a Mixture-of-Experts approach. The model was trained on 20 trillion tokens and features a 1 million token context length, along with advanced techniques like LatentMoE and Multi Token Prediction. Nemotron 3 Ultra demonstrates up to six times higher inference throughput than current state-of-the-art models while maintaining comparable accuracy, making it suitable for complex agentic tasks. The model's checkpoints, training data, and recipe have been open-sourced on Hugging Face. AI
IMPACT This open-source release of a high-throughput, long-context model could accelerate agentic AI development and research.