Brief · PulseAugur

TOOL · Together AI blog English(EN) · 4d

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Together AI has significantly upgraded its Batch Inference API, introducing a more user-friendly interface and expanding model compatibility to include all serverless and private deployment models. The update dramatically increases rate limits by 3000x, from 10 million to 30 billion enqueued tokens per model per user, enabling much larger-scale data processing. These enhancements aim to make high-throughput workloads more cost-effective and accessible, with costs typically at 50% of their real-time API for most serverless models. AI

IMPACT Enables more cost-effective and scalable processing for large AI workloads like synthetic data generation and model evaluation.

Together AI
Inception Labs
Volodymyr Kuleshov
Batch Inference API