Researchers have introduced Nemotron 3 Ultra, a 550 billion parameter language model that utilizes a hybrid Mamba-Transformer architecture with a Mixture-of-Experts approach. The model was trained on 20 trillion tokens and features a 1 million token context length, along with advanced techniques like LatentMoE and Multi Token Prediction. Nemotron 3 Ultra demonstrates up to six times higher inference throughput than current state-of-the-art models while maintaining comparable accuracy, making it suitable for complex agentic tasks. The model's checkpoints, training data, and recipe have been open-sourced on Hugging Face. AI
IMPACT This open-source release of a high-throughput, long-context model could accelerate agentic AI development and research.
RANK_REASON The cluster describes a new research paper detailing a novel language model architecture and its performance, with open-sourced components.
- Hugging Face
- LatentMoE
- Mamba-Transformer
- Mixture-of-Experts
- Multi Token Prediction
- Nemotron 3 Ultra
- NVFP4
- Multi-Teacher On-Policy Distillation
- RLVR
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →