DAPD: Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs
Researchers have introduced Dependency-Aware Parallel Decoding (DAPD), a novel method for accelerating the decoding process in Diffusion Large Language Models (dLLMs). DAPD utilizes self-attention to construct a conditional dependency graph, enabling parallel unmasking of tokens by identifying independent sets within the graph. This training-free approach avoids the need for auxiliary models or retraining, improving the accuracy-steps trade-off and better leveraging the any-order generation capabilities of dLLMs. AI
IMPACT Accelerates inference for Diffusion LLMs, potentially enabling faster generation and wider adoption of these models.