AdapTive
Together AI has introduced ATLAS, a novel adaptive-learning system for speculative decoding that dynamically improves LLM inference performance without manual tuning. Unlike standard or custom speculators, ATLAS continuously learns from runtime usage and evolving workloads to optimize token drafting in real time. This system achieves significant speedups, reaching up to 500 TPS on DeepSeek-V3.1 and 460 TPS on Kimi-K2, outperforming even specialized hardware like Groq. AI
IMPACT Accelerates LLM inference speed and reduces costs by dynamically optimizing speculative decoding.