Together AI
PulseAugur coverage of Together AI — every cluster mentioning Together AI across labs, papers, and developer communities, ranked by signal.
- developed Oscar 95%
- used by MiniMax AI 90%
- used by cubic metre 90%
- developed DeepSeek V4-Pro 90%
- competes with GLM-5.2 90%
- developed Artificial Analysis 90%
- developed GLM-5.2 90%
- partners with Pearl Research Labs 90%
- uses Deepgram 90%
- partners with Ce Zhang 90%
- uses Gemma-4-31B-it-Pearl 90%
- uses Nvidia Blackwell B200 90%
- 2026-06-25 product_launch Together AI announced that its platform, using the GLM-5.2 model, can generate web applications for a few cents per iteration. source
- 2026-06-22 product_launch Together AI released the Brrrrr inference model. source
- 2026-06-18 partnership Together AI and NVIDIA are co-hosting an event on July 1st at the AI Engineer World's Fair to discuss open models and collective agent intelligence. source
- 2026-06-13 product_launch Together AI launched the MiniMax-M3 multimodal model. source
- 2026-06-12 research_milestone Together AI released benchmarks showing significant performance gains on Blackwell hardware for AI agent infrastructure. source
- 2026-06-10 research_milestone Together AI achieved ISO 27001:2022 certification after a successful audit. source
- 2026-06-10 research_milestone Together AI achieved ISO 27001:2022 certification for its Information Security Management System. source
- 2026-06-09 partnership Together AI partnered with Pax8 to offer AI infrastructure and models to small and medium-sized businesses. source
- 2026-06-01 product_launch Together AI is announcing a new model called M3. source
- 2026-05-29 product_launch Together AI is now serving the two fastest speech-to-text models, including NVIDIA Parakeet-TDT 0.6B v3. source
- 2026-05-29 product_launch Together AI launched a new open-source AI translation application. source
- 2026-05-22 product_launch Together AI launched updates to its Fine-Tuning Platform, adding support for new LLMs and extending context lengths. source
- 2026-05-22 product_launch Together AI announced the addition of 1,000 NVIDIA H100 and H200 GPUs to its infrastructure. source
- 2026-05-22 product_launch Together AI launches GPU clusters with NVIDIA Blackwell platform and optimized kernel collection, achieving significant performance gains. source
- 2026-05-22 product_launch Together AI launched major upgrades to its Batch Inference API. source
27 day(s) with sentiment data
Together AI significantly bolsters inference capacity with H100/H200 GPU expansion
The addition of one thousand NVIDIA H100 and H200 GPUs to Together AI's infrastructure represents a substantial investment in inference capabilities. This move directly supports the growing demand for high-throughput AI model serving and is likely intended to power both their internal services and external customer workloads.
Together AI to offer ATLAS as a distinct inference optimization service
Given the significant performance gains demonstrated by ATLAS, Together AI may soon offer this adaptive-learning inference system as a standalone service or an add-on feature for their existing GPU offerings. This would allow customers to leverage ATLAS's dynamic optimization without needing to manage the underlying infrastructure themselves.
Together AI's ATLAS system demonstrates superior inference speed on par with specialized hardware
Together AI's newly launched ATLAS system, an adaptive-learning inference engine, is showing remarkable performance, achieving up to 500 TPS on DeepSeek-V3.1. This performance rivals that of specialized hardware like Groq, suggesting Together AI is effectively optimizing LLM inference beyond standard GPU capabilities.
Together AI to integrate NVIDIA Blackwell features into all core services
The 90% training speed boost achieved with NVIDIA Blackwell and custom kernels indicates a deep integration. It's likely Together AI will leverage Blackwell's capabilities across their entire platform, including their new instant clusters and fine-tuning services, to offer a performance edge over competitors.
Together AI's ATLAS system shows strong performance against specialized hardware
The reported performance of Together AI's ATLAS system, achieving up to 500 TPS on DeepSeek-V3.1 and outperforming specialized hardware like Groq, is a significant technical achievement. This suggests their adaptive inference approach is highly effective and could set a new benchmark for LLM inference speed and efficiency.
-
MiniMax AI and Together AI to Discuss Large-Scale Agent Infrastructure
MiniMax AI and Together AI will participate in a discussion about the challenges of running large-scale AI agents. The conversation will cover training decisions for long-context reasoning and tool use, as well as the i…
-
LiteLLM vs. OpenRouter: A Deep Dive into LLM Proxy Architectures
LiteLLM and OpenRouter serve different purposes in LLM access, with LiteLLM being a self-hosted, open-source proxy and OpenRouter a managed cloud aggregator. LiteLLM offers extensive model provider support and load bala…
-
Open-source AI inference demand drives strategic model choice, says Together AI
The increasing demand for AI model inference is driving a strategic shift towards model selection, with organizations prioritizing frontier-quality models that offer improved tokenomics, cost control, and deployment fle…
-
Together AI's GLM-5.2 aids web app iteration, user shares workflow
Together AI is promoting its GLM-5.2 model, highlighting its utility for web application development. A user, nutlope, shared their workflow which involves generating multiple variations of an idea with GLM-5.2 to selec…
-
Sail Research launches with $80M to optimize AI infrastructure for autonomous agents
Sail Research, a startup founded by former Apple and Together AI engineers, has launched with $80 million in funding to address the infrastructure needs of long-running AI agents. The company's platform is designed to o…
-
Together AI claims world's fastest speech-to-text stack
Together AI has developed a speech-to-text system that achieves industry-leading speed. Their 'parakeet' model, running on Together's infrastructure, processes audio at approximately 302 seconds of audio per second of p…
-
Together AI offers web app generation for cents with GLM-5.2
Together AI has announced that its platform, utilizing the GLM-5.2 model, can generate web applications for a minimal cost per iteration. This low cost aims to enable developers to experiment more freely with different …
-
Together AI launches GLM Arena to benchmark GLM 5.2 against Anthropic's Opus 4.8
Together AI has launched GLM Arena, a platform for evaluating language models. The arena features tests comparing GLM 5.2 against Anthropic's Opus 4.8, showing that GLM 5.2 can produce twice the tokens at a lower cost a…
-
LLM Inference Pricing Compared Across 7 Providers, Highlighting Caching Costs
A user compiled a spreadsheet comparing LLM inference pricing across seven providers, including OpenAI, Anthropic, Cohere, and Mistral AI. The comparison focuses on input/output token pricing, context windows, and cache…
-
Together AI sees 400T token adoption for open-source models
Together AI is seeing significant adoption of its infrastructure, with teams processing 400 trillion tokens on open-source models. This surge in usage is driven by a desire for frontier-level quality, improved tokenomic…
-
Together AI releases open-source Parallel Kernel Builder for LLM inference
Together AI has released Parallel Kernel Builder (PKB), an open-source tool designed to optimize inference performance for large language models. PKB can identify and generate novel kernels, such as those for NeMo vocab…
-
Together AI's GLM 5.2 outperforms Anthropic's Opus on speed and cost
Together AI has released results from 10 tests comparing their GLM 5.2 model against Anthropic's Opus model. The tests indicated that GLM 5.2 produced twice the tokens of Opus, was faster, and cost three times less, whi…
-
Frontier LLMs struggle with multi-GPU kernel generation, new benchmark reveals
A new benchmark called ParallelKernelBench (PKB) has been developed to evaluate the ability of frontier large language models to generate efficient multi-GPU kernels. Testing models like GPT-5.5, Gemini 3 Pro, and Opus …
-
Together AI releases free Brrrrr inference model
Together AI has released Brrrrr, a new inference model that is available for free use. Early benchmarks show the model achieving 131 tokens per second.
-
Together AI's Blind Test challenges users to distinguish between GLM-5.2 and Opus 4.8
Together AI has launched "The Blind Test," a challenge designed to compare the capabilities of two large language models: GLM-5.2 and Opus 4.8. The test presents users with two landing pages, each generated by one of th…
-
Together AI deploys NVIDIA GB300 NVL72 for next-gen AI inference infrastructure
Together AI is deploying next-generation infrastructure designed for large-scale AI inference and reasoning. This new AI Factory utilizes NVIDIA GB300 NVL72 systems, which feature high-density compute and advanced cooli…
-
Together AI offers free GLM-5.2 access via Together Chat
Together AI is offering free access to test GLM-5.2 through its platform, Together Chat. Users can begin prompting with the model immediately without needing to set up an API. The service is hosted on secure North Ameri…
-
Together AI offers free access to GLM-5.2 model on Together Chat
Together AI is offering free access to its GLM-5.2 model through its Together Chat platform. This initiative allows users to experiment with the GLM-5.2 model without charge via the chat interface.
-
Together AI releases GLM-5.2, showcasing fast inference and reasoning capabilities
Together AI has released GLM-5.2, an open-source model capable of complex reasoning and code patching, which they are touting as a significant advancement over previous closed-model tasks. The model demonstrated its cap…
-
Together AI's voice agent interacts with screens for code editing
Together AI has demonstrated a voice agent capable of interacting with a user's screen, performing tasks like website design review and code editing. This system integrates speech-to-text, voice processing, and reasonin…