Nvidia B200
PulseAugur coverage of Nvidia B200 — every cluster mentioning Nvidia B200 across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
Modal achieves serverless GPUs for AI inference in seconds
Modal has developed a system to achieve truly serverless GPUs for AI inference, addressing the challenge of rapidly scaling resources to meet variable demand. Their approach involves maintaining cloud buffers of idle GP…
-
Together AI releases FlashAttention-3 and -4 for faster LLM processing
Together AI has released FlashAttention-3 and FlashAttention-4, significant upgrades to their GPU-accelerated attention mechanism for large language models. FlashAttention-3, designed for Hopper GPUs, achieves up to 75%…
-
Cohere releases open-source Command A+ AI model for enterprise agents
Cohere has released Command A+, an open-source, multimodal AI model designed for enterprise use and agentic tasks. This new model integrates reasoning, vision, and multilingual capabilities, supporting 48 languages and …
-
AMD MI355 cheaper than Nvidia B200 for GLM5 serving
AMD's MI355 accelerator is now 40% cheaper than Nvidia's B200 for serving on the GLM5 architecture. This cost reduction comes 14 weeks after the initial launch of GLM5, which supports both non-MTP and other configurations.
-
Nvidia B200 GPUs deployed in cost-saving inference clusters
Nvidia's B200 GPUs are being deployed in large clusters, utilizing RoCEv2 Ethernet and Tomahawk switches for efficient inference. This setup allows for significant cost savings as more machines are added, indicating a t…
-
IonRouter launches AI inference service with custom IonAttention engine
IonRouter has launched a new inference service designed for high throughput and low cost, utilizing its proprietary IonAttention engine. This engine is capable of multiplexing multiple models on a single GPU, enabling r…