NVIDIA's 550B finally lands: free to use, expensive to host
NVIDIA has released Nemotron 3 Ultra, a 550 billion parameter open-weight model featuring a hybrid Mamba-Attention design and a 1 million token context window. The model weights are freely available under the OpenMDW-1.1 license, but self-hosting requires significant datacenter-class hardware, such as multiple H100 or H200 GPUs. For easier access, NVIDIA offers a hosted API that is compatible with the OpenAI protocol. AI
IMPACT This release provides a powerful open-weight model, but its demanding hardware requirements highlight the ongoing challenges of self-hosting large AI systems.