Dynamic Chunking for Diffusion Language Models
Researchers are exploring new methods to improve diffusion language models (DLMs), which offer faster inference than autoregressive models. Several recent papers introduce techniques to enhance DLM performance, including NAVIRA for decoupled remasking, SARDI for retrieval-augmented generation using discarded tokens, and AXON for supportive token revealing. Another study identifies limitations in DLMs, such as a locality bias and distraction from mask tokens, proposing a mask-agnostic loss function to improve context comprehension. Additionally, a survey provides a comprehensive overview of the DLM landscape, covering foundational principles, state-of-the-art models, and future research directions. AI
IMPACT New techniques aim to improve the speed and accuracy of diffusion language models, potentially making them more competitive with autoregressive models.
- DCDM
- Dynamic Chunking Diffusion Model
- Chunking Attention
- OpenWebText
- RePlaid
- arXiv
- Hugging Face
- Block Approximate Sparse Attention
- Dynamic Chunking Diffusion Models
- FlashAttention
- Diffusion Language Models
- DLM-SWAI
- Boundary-Guided Policy Optimization
- Eso-LMs
- Masked Diffusion Models
- DiffRetriever
- BlockBatch
- dgMARK
- Dynamic Infilling Anchors (DIA)
- PRISM
- AXON
- Masked Diffusion Language Models
- T$^\star$
- Hanchen Xia
- SARDI
- Autoregressive Language Models
- NAVIRA
- Maksim Kryzhanovskiy
- Tianyi Li
- Julianna Piskorz