Together AI
PulseAugur coverage of Together AI — every cluster mentioning Together AI across labs, papers, and developer communities, ranked by signal.
- uses Gemma-4-31B-it-Pearl 90%
- uses Deepgram 90%
- founded Vipul Ved Prakash 90%
- partners with Pearl Research Labs 90%
- uses Nvidia Blackwell B200 90%
- developed Together Code Interpreter 90%
- used by NVIDIA Parakeet-TDT 0.6B v3 90%
- developed Gemma-4-31B-it-Pearl 90%
- employed by Dan Fu 90%
- partners with MiniMax AI 80%
- used by MiniMax AI 75%
- used by DeepSeek-R1 70%
- 2026-06-13 product_launch Together AI launched the MiniMax-M3 multimodal model. source
- 2026-06-12 research_milestone Together AI released benchmarks showing significant performance gains on Blackwell hardware for AI agent infrastructure. source
- 2026-06-10 research_milestone Together AI achieved ISO 27001:2022 certification after a successful audit. source
- 2026-06-10 research_milestone Together AI achieved ISO 27001:2022 certification for its Information Security Management System. source
- 2026-06-09 partnership Together AI partnered with Pax8 to offer AI infrastructure and models to small and medium-sized businesses. source
- 2026-06-01 product_launch Together AI is announcing a new model called M3. source
- 2026-05-29 product_launch Together AI is now serving the two fastest speech-to-text models, including NVIDIA Parakeet-TDT 0.6B v3. source
- 2026-05-29 product_launch Together AI launched a new open-source AI translation application. source
- 2026-05-22 product_launch Together AI launched updates to its Fine-Tuning Platform, adding support for new LLMs and extending context lengths. source
- 2026-05-22 product_launch Together AI announced the addition of 1,000 NVIDIA H100 and H200 GPUs to its infrastructure. source
- 2026-05-22 product_launch Together AI launches GPU clusters with NVIDIA Blackwell platform and optimized kernel collection, achieving significant performance gains. source
- 2026-05-22 product_launch Together AI launched major upgrades to its Batch Inference API. source
- 2026-05-22 product_launch Together AI released FlashAttention-3 and FlashAttention-4, optimized attention mechanisms for GPUs. source
- 2026-05-22 product_launch Together AI launched access to the Qwen3.7-Max model. source
- 2026-05-15 partnership Together AI and Pearl Research Labs formed a partnership to integrate blockchain for AI inference cost reduction. source
20 day(s) with sentiment data
Together AI's ATLAS system demonstrates superior inference speed on par with specialized hardware
Together AI's newly launched ATLAS system, an adaptive-learning inference engine, is showing remarkable performance, achieving up to 500 TPS on DeepSeek-V3.1. This performance rivals that of specialized hardware like Groq, suggesting Together AI is effectively optimizing LLM inference beyond standard GPU capabilities.
Together AI significantly bolsters inference capacity with H100/H200 GPU expansion
The addition of one thousand NVIDIA H100 and H200 GPUs to Together AI's infrastructure represents a substantial investment in inference capabilities. This move directly supports the growing demand for high-throughput AI model serving and is likely intended to power both their internal services and external customer workloads.
Together AI to offer ATLAS as a distinct inference optimization service
Given the significant performance gains demonstrated by ATLAS, Together AI may soon offer this adaptive-learning inference system as a standalone service or an add-on feature for their existing GPU offerings. This would allow customers to leverage ATLAS's dynamic optimization without needing to manage the underlying infrastructure themselves.
Together AI to integrate NVIDIA Blackwell features into all core services
The 90% training speed boost achieved with NVIDIA Blackwell and custom kernels indicates a deep integration. It's likely Together AI will leverage Blackwell's capabilities across their entire platform, including their new instant clusters and fine-tuning services, to offer a performance edge over competitors.
Together AI's ATLAS system shows strong performance against specialized hardware
The reported performance of Together AI's ATLAS system, achieving up to 500 TPS on DeepSeek-V3.1 and outperforming specialized hardware like Groq, is a significant technical achievement. This suggests their adaptive inference approach is highly effective and could set a new benchmark for LLM inference speed and efficiency.
-
Together AI releases Frontier Agents open-source inference model
Together AI has released a new open-source model called Frontier Agents. This model is designed for inference and is the latest development from their Frontier Agents Research team, led by James Y. Zou.
-
Together AI releases open-source Hot Wings model for inference
Together AI has released a new open-source model called Hot Wings, designed for inference. The company showcased this model at NVIDIA's GTC conference, highlighting its capabilities. This release aims to provide a power…
-
Ideogram releases open-weight Ideogram 4 model with 2K resolution
Ideogram has released Ideogram 4, an open-weight text-to-image model that excels in design-oriented tasks and text rendering. The model offers native 2K resolution and advanced features like bounding box control and str…
-
Together AI serves fastest speech-to-text models
Together AI is now serving the two fastest speech-to-text models, according to Artificial Analysis. The NVIDIA Parakeet-TDT 0.6B v3 model can transcribe 20 hours of audio in less than 10 seconds. This performance is ach…
-
Together AI releases open-source AI translation app
Together AI has released an open-source AI translation application, built using their inference tools. The application is designed to be fun and accessible for users.
-
Together AI builds world's fastest speech-to-text stack
Together AI has developed a highly efficient speech-to-text system, significantly outperforming existing models in speed. Their approach addresses the unique challenges of audio data processing, which is substantially l…
-
Together AI open-sources OSCAR for efficient LLM serving
Together AI has open-sourced OSCAR, a new system for 2-bit KV cache quantization. This technique aims to improve the efficiency of serving large language models, particularly those with long context windows. The develop…
-
LLM API keys leaking from GitHub Actions, CheckAPIs tool emerges
Many organizations are inadvertently leaking API keys for large language models by storing them insecurely in code repositories and CI/CD pipelines. Unlike traditional secrets, these LLM keys are often not rotated and c…
-
Together AI adds 1,000 H100/H200 GPUs for inference
Together AI has significantly expanded its GPU capacity by adding one thousand NVIDIA H100 and H200 instances. These powerful GPUs are now available through Together's on-demand GPU clusters and dedicated endpoint servi…
-
Together AI boosts Batch Inference API with 3000x rate limit increase
Together AI has significantly upgraded its Batch Inference API, introducing a more user-friendly interface and expanding model compatibility to include all serverless and private deployment models. The update dramatical…
-
Together AI launches self-service GPU clusters for AI development
Together AI has launched Together Instant Clusters, a new service providing readily available, self-service GPU clusters for AI development and deployment. This offering aims to simplify the complex process of setting u…
-
Together AI launches adaptive LLM inference system ATLAS
Together AI has introduced ATLAS, a novel adaptive-learning system for speculative decoding that dynamically improves LLM inference performance without manual tuning. Unlike standard or custom speculators, ATLAS continu…
-
Together AI releases FlashAttention-3 and -4 for faster LLM processing
Together AI has released FlashAttention-3 and FlashAttention-4, significant upgrades to their GPU-accelerated attention mechanism for large language models. FlashAttention-3, designed for Hopper GPUs, achieves up to 75%…
-
Together AI offers Alibaba's Qwen3.7-Max with 1M context
Together AI is now offering access to Alibaba's Qwen3.7-Max model, a flagship offering designed for the agent era. This model boasts a 1 million token context window and demonstrates leading performance in areas such as…
-
MiniMax AI launches 600+ new voices via Speech 2.8 Turbo on Together AI
MiniMax AI has released over 600 new voices through its Speech 2.8 Turbo model. These voices are now accessible on the Together AI platform. This expansion aims to provide a wider range of synthetic speech options.
-
Cursor AI launches Composer 2.5 with Together AI partnership
Together AI has partnered with Cursor AI to launch Composer 2.5, a significant advancement for agentic coding models. This new version is noted for its speed and quality, pushing the boundaries of what coding agents can…
-
Together AI hosts MLSys 2026 social event
Together AI is hosting an event called "Inference After Dark" during the MLSys 2026 conference. The event will take place on Tuesday, May 19th, from 7:30 PM to 10:00 PM at Tavern Hall in Bellevue, WA. It is intended as …
-
Together AI releases Pearl-powered Gemma-4-31B-it-Pearl model
Together AI has released Gemma-4-31B-it-Pearl, an open-source model with enhanced capabilities. This model supports a 32K context window, configurable thinking processes, function calling, and JSON mode. It marks Togeth…
-
Pearl Labs partners with Together AI for inference optimization
Pearl Research Labs has announced its first major enterprise partnership with Together AI, focusing on optimizing inference workloads. This collaboration aims to transform hyperscalers' inference capital expenditures in…
-
Together AI launches Pearl-integrated Gemma model with Proof of Useful Work
Together AI has released Gemma-4-31B-it-Pearl, an instruction-tuned model based on Gemma 4 31B. This model integrates the Pearl Network's Proof of Useful Work protocol, which generates proofs from existing matrix multip…