Nemotron 3 Ultra: Open-Source LLM Boasts 1M Context, 6x Throughput

By PulseAugur Editorial · [2 sources] · 2026-06-12 00:00

Researchers have introduced Nemotron 3 Ultra, a 550 billion parameter language model that utilizes a hybrid Mamba-Transformer architecture with a Mixture-of-Experts approach. The model was trained on 20 trillion tokens and features a 1 million token context length, along with advanced techniques like LatentMoE and Multi Token Prediction. Nemotron 3 Ultra demonstrates up to six times higher inference throughput than current state-of-the-art models while maintaining comparable accuracy, making it suitable for complex agentic tasks. The model's checkpoints, training data, and recipe have been open-sourced on Hugging Face. AI

IMPACT This open-source release of a high-throughput, long-context model could accelerate agentic AI development and research.

RANK_REASON The cluster describes a new research paper detailing a novel language model architecture and its performance, with open-sourced components.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · NVIDIA (Allan), : (Allan), Aaron Blakeman (Allan), Aaron Thomas (Allan), Aastha Jhunjhunwala (Allan), Abhibha Gupta (Allan), Abhinav Khattar (Allan), Adam Rajfer (Allan), Adi Renduchintala (Allan), Adil Asif (Allan), Aditya Vavre (Allan), Adriana Flores… · 2026-06-16 04:00

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

arXiv:2606.15007v1 Announce Type: cross Abstract: We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context len…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-12 00:00

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Ultra is a large-scale language model featuring hybrid Mamba-Attention architecture with 550 billion parameters, achieving high inference throughput and extended context length through specialized training techniques.

COVERAGE [2]

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

RELATED ENTITIES

RELATED TOPICS