Together AI boosts inference speed and deploys custom models

By PulseAugur Editorial · [4 sources] · 2025-08-27 00:00

Together AI has launched Dedicated Container Inference, a new service designed to optimize the deployment and execution of custom generative media models. This platform offers production-grade orchestration, including autoscaling and monitoring, enabling teams to bring their Docker containers directly to Together's GPU infrastructure. The company also announced significant performance improvements across its inference platform, claiming up to 2x faster speeds for open-source LLMs through hardware optimizations, advanced quantization, and speculative decoding. Additionally, Together AI is now offering access to DeepSeek-V3.1, a hybrid model that supports both fast response and deep reasoning modes, making complex analysis more practical for production environments. AI

IMPACT Accelerates deployment of custom AI models and enhances performance for open-source LLMs, potentially lowering inference costs.

RANK_REASON The cluster details new product launches and significant performance claims from a major AI infrastructure provider.

Read on Together AI blog →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

Together AI boosts inference speed and deploys custom models

COVERAGE [4]

X — Together (inference / OSS) TIER_1 English(EN) · togethercompute · 2026-05-27 00:06

RT @vipulved: Our inference stack, optimized for Blackwells, with a novel attention kernel and many new optimizations has started rolling o…

RT @vipulved: Our inference stack, optimized for Blackwells, with a novel attention kernel and many new optimizations has started rolling o…
Together AI blog TIER_1 English(EN) · 2026-02-12 00:00

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.
Together AI blog TIER_1 English(EN) · 2025-12-01 00:00

Together AI delivers fastest inference for the top open-source models

Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell architecture.
Together AI blog TIER_1 English(EN) · 2025-08-27 00:00

DeepSeek-V3.1: Hybrid Thinking Model Now Available on Together AI

Access DeepSeek-V3.1 on Together AI: MIT-licensed hybrid model with thinking/non-thinking modes, 66% SWE-bench Verified, serverless deployment, 99.9% SLA.

COVERAGE [4]

RT @vipulved: Our inference stack, optimized for Blackwells, with a novel attention kernel and many new optimizations has started rolling o…

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

Together AI delivers fastest inference for the top open-source models

DeepSeek-V3.1: Hybrid Thinking Model Now Available on Together AI

RELATED ENTITIES

RELATED TOPICS