Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation
Researchers at Together AI have developed AutoJudge, a novel method to accelerate large language model inference. This technique automates the curation of task-specific datasets, enabling lossy speculative decoding without manual annotation. AutoJudge identifies critical tokens that impact downstream quality, achieving up to a 2x speedup over standard speculative decoding with minimal accuracy loss. AI
IMPACT Accelerates LLM inference by automating dataset curation for speculative decoding, potentially reducing operational costs.