Perplexity has published research detailing how they serve large language models, specifically Qwen3 235B, on NVIDIA's GB200 NVL72 Blackwell racks. The findings indicate that the GB200 platform offers significant improvements over previous NVIDIA hardware for large-model inference, boasting reduced latency and higher throughput. This research highlights the GB200's capabilities for both training and high-throughput inference, particularly for Mixture-of-Experts (MoE) models. AI
影响 NVIDIA's GB200 Blackwell platform shows significant gains in LLM inference speed and cost-efficiency, potentially accelerating deployment of large models.
排序理由 Cluster contains research published by Perplexity on LLM inference hardware.
AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →