Together AI boosts custom model inference speed, optimizes open-source LLMs

By PulseAugur Editorial · [3 sources] · 2025-08-27 00:00

Together AI has launched a new service called Dedicated Container Inference, designed to optimize the deployment and performance of custom generative media models. This platform handles complex orchestration tasks like autoscaling, queuing, and traffic isolation, allowing teams to focus on their model logic. The service has already demonstrated significant inference speedups, with some customers experiencing up to 2.6x faster performance. Additionally, Together AI has announced advancements in their inference platform, achieving up to 2x faster serverless inference for top open-source models by leveraging next-generation GPU hardware and optimized kernels. AI

IMPACT Accelerates deployment and inference for custom and open-source AI models, potentially lowering costs and increasing accessibility for specialized AI applications.

RANK_REASON The cluster announces a new product offering and significant performance improvements for existing services from a notable AI infrastructure provider.

Read on Together AI blog →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

Together AI boosts custom model inference speed, optimizes open-source LLMs

COVERAGE [3]

Together AI blog TIER_1 English(EN) · 2026-02-12 00:00

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

Together AI launches production-grade orchestration for custom AI models with 1.4x–2.6x faster inference.
Together AI blog TIER_1 English(EN) · 2025-12-01 00:00

Together AI delivers fastest inference for the top open-source models

Together AI achieves up to 2x faster inference for top open-source models like Qwen, DeepSeek, and Kimi through GPU optimization, advanced speculative decoding, and FP4 quantization—ranking #1 in speed benchmarks on NVIDIA Blackwell architecture.
Together AI blog TIER_1 English(EN) · 2025-08-27 00:00

DeepSeek-V3.1: Hybrid Thinking Model Now Available on Together AI

Access DeepSeek-V3.1 on Together AI: MIT-licensed hybrid model with thinking/non-thinking modes, 66% SWE-bench Verified, serverless deployment, 99.9% SLA.

COVERAGE [3]

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

Together AI delivers fastest inference for the top open-source models

DeepSeek-V3.1: Hybrid Thinking Model Now Available on Together AI

RELATED ENTITIES

RELATED TOPICS