Together AI has introduced ATLAS, a novel adaptive-learning system for speculative decoding that dynamically improves LLM inference performance without manual tuning. Unlike standard or custom speculators, ATLAS continuously learns from runtime usage and evolving workloads to optimize token drafting in real time. This system achieves significant speedups, reaching up to 500 TPS on DeepSeek-V3.1 and 460 TPS on Kimi-K2, outperforming even specialized hardware like Groq. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Accelerates LLM inference speed and reduces costs by dynamically optimizing speculative decoding.
RANK_REASON Product launch of a novel inference optimization technique for LLMs. [lever_c_demoted from significant: ic=1 ai=1.0]