Together AI has launched new GPU clusters featuring NVIDIA's Blackwell platform, offering significant speedups for AI training and inference. These clusters, powered by the Together Kernel Collection, achieve up to 90% faster training speeds compared to previous NVIDIA H100 hardware, processing over 15,000 tokens per second for large models. Early access customers like Salesforce and Zoom have reported substantial performance gains, with some experiencing double the training speed. Together AI's optimization efforts span custom kernels, inference engines, and speculative decoding, aiming to redefine efficiency in AI model development and deployment. AI
IMPACT Accelerates AI training and inference, potentially lowering costs and increasing the pace of model development and deployment for enterprises.
RANK_REASON This cluster details a significant infrastructure upgrade and performance improvement for AI workloads by a major cloud provider, leveraging new hardware from a leading chip manufacturer.
- FlashAttention-3
- Llama-2-70B
- MLPerf Inference v4.1
- NVIDIA Blackwell
- NVIDIA HGX B200
- NVIDIA HGX H100
- Together AI
- Together Kernel Collection
- Tri Dao
- DeepSeek-R1-0528
- InVideo
- Salesforce
- Zoom
AI-generated summary · Google Gemini · from 3 sources. How we write summaries →