Together AI releases NVIDIA Nemotron 3 Ultra and 3.5 ASR models

By PulseAugur Editorial · [3 sources] · 2026-06-04 13:01

Together AI has released two new NVIDIA Nemotron models, Nemotron 3 Ultra and Nemotron 3.5 ASR. Nemotron 3 Ultra is designed for complex agentic tasks requiring long-term planning and iteration, boasting 550 billion parameters and a 1 million token context window. Nemotron 3.5 ASR is optimized for streaming speech recognition and voice agents, offering low-latency multilingual capabilities. AI

IMPACT Provides new open-source models for agentic workloads and speech recognition, potentially lowering barriers for developers.

RANK_REASON Release of new open-source models from a non-frontier lab.

Read on X — Together (inference / OSS) →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

Together AI releases NVIDIA Nemotron 3 Ultra and 3.5 ASR models

COVERAGE [3]

X — Together (inference / OSS) TIER_1 English(EN) · togethercompute · 2026-06-04 13:03

Nemotron 3.5 ASR is built for streaming multilingual speech recognition and voice agents.

Nemotron 3.5 ASR is built for streaming multilingual speech recognition and voice agents. One 0.6B checkpoint. 40 language-locales. Sub-100ms latency. Cache-aware FastConformer carries context forward between chunks instead of repeatedly reprocessing overlapping audio. Try it:
X — Together (inference / OSS) TIER_1 English(EN) · togethercompute · 2026-06-04 13:01

Nemotron 3 Ultra is built for long-horizon agents that need to plan, code, test, debug, and iterate across large working sets.

Nemotron 3 Ultra is built for long-horizon agents that need to plan, code, test, debug, and iterate across large working sets. 550B parameters, 55B active, 1M context, Hybrid Mamba-Transformer MoE, Multi-Token Prediction, and multi-environment RL training. Try it:
X — Together (inference / OSS) TIER_1 English(EN) · togethercompute · 2026-06-04 13:01

Introducing two NVIDIA Nemotron models on Together AI: Nemotron 3 Ultra for high-throughput agentic workloads and Nemotron 3.5 ASR for low-latency multilingual

Introducing two NVIDIA Nemotron models on Together AI: Nemotron 3 Ultra for high-throughput agentic workloads and Nemotron 3.5 ASR for low-latency multilingual speech recognition. AI natives can now build coding agents, deep research agents, and real-time voice systems on the AI…

COVERAGE [3]

Nemotron 3.5 ASR is built for streaming multilingual speech recognition and voice agents.

Nemotron 3 Ultra is built for long-horizon agents that need to plan, code, test, debug, and iterate across large working sets.

Introducing two NVIDIA Nemotron models on Together AI: Nemotron 3 Ultra for high-throughput agentic workloads and Nemotron 3.5 ASR for low-latency multilingual

RELATED ENTITIES

RELATED TOPICS