Brief · PulseAugur

TOOL · Hugging Face Daily Papers English(EN) · 2w

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Researchers have developed Draft-OPD, a new method to improve the efficiency of speculative decoding in large language models. This technique addresses the mismatch between offline training and real-time inference by using on-policy distillation. Draft-OPD incorporates target-assisted rollouts and error replay to enable the draft model to learn from both accepted and rejected proposals, focusing on errors that hinder speculative acceptance. Experiments show this method can achieve over five times lossless acceleration for language models. AI

IMPACT Enhances LLM inference speed, potentially accelerating deployment and reducing computational costs for AI applications.

speculative decoding
large language model
on-policy distillation
EAGLE3
Draft-OPD