Brief · PulseAugur

TOOL · Together AI blog English(EN) · 5mo

Introducing AutoJudge: Streamlined inference acceleration via automated dataset curation

Researchers at Together AI have developed AutoJudge, a novel method to accelerate large language model inference. This technique automates the curation of task-specific datasets, enabling lossy speculative decoding without manual annotation. AutoJudge identifies critical tokens that impact downstream quality, achieving up to a 2x speedup over standard speculative decoding with minimal accuracy loss. AI

IMPACT Accelerates LLM inference by automating dataset curation for speculative decoding, potentially reducing operational costs.

Together AI
speculative decoding
vLLM
large language model
TensorRT-LLM
NeurIPS 2025
AutoJudge