Together AI boosts Batch Inference API with 3000x rate limit increase

By PulseAugur Editorial · [1 sources] · 2026-05-22 15:59

Together AI has significantly upgraded its Batch Inference API, introducing a more user-friendly interface and expanding model compatibility to include all serverless and private deployment models. The update dramatically increases rate limits by 3000x, from 10 million to 30 billion enqueued tokens per model per user, enabling much larger-scale data processing. These enhancements aim to make high-throughput workloads more cost-effective and accessible, with costs typically at 50% of their real-time API for most serverless models. AI

IMPACT Enables more cost-effective and scalable processing for large AI workloads like synthetic data generation and model evaluation.

RANK_REASON Product update to an existing API service.

Read on Together AI blog →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Together AI boosts Batch Inference API with 3000x rate limit increase

COVERAGE [1]

Together AI blog TIER_1 English(EN) · 2026-05-22 15:59

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of real-time APIs.

COVERAGE [1]

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

RELATED ENTITIES

RELATED TOPICS