New research enhances diffusion language model efficiency and scalability

作者 PulseAugur 编辑部 · [11 个来源] · 2026-05-15 06:56

Researchers are exploring new methods to improve the efficiency and scalability of diffusion language models (DLMs) for generating long sequences of text. One approach, Block Approximate Sparse Attention (BA-Att), accelerates attention computation by downsampling the attention space, achieving significant speedups while maintaining near full-attention performance. Another development, Dynamic Chunking Diffusion Models (DCDM), replaces fixed positional blocks with content-defined semantic chunks to better capture sequence structure. Additionally, advancements in continuous diffusion models, like RePlaid, demonstrate competitive performance against discrete DLMs, suggesting they are a viable and scalable alternative. AI

影响 New techniques promise faster and more scalable text generation from diffusion models, potentially enabling longer and more coherent outputs.

排序理由 Multiple arXiv papers detailing novel methods for improving diffusion language models.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 11 个来源。我们如何撰写摘要 →

报道来源 [11]

arXiv cs.CL TIER_1 English(EN) · Linye Wei, Zixiang Luo, Pingzhi Tang, Meng Li · 2026-05-25 04:00

TEAM：面向MoE扩散语言模型的时空一致性引导专家激活加速

arXiv:2602.08404v2 Announce Type: replace Abstract: Diffusion large language models (dLLMs) have recently gained significant attention due to their inherent support for parallel decoding. Building on this paradigm, Mixture-of-Experts (MoE) dLLMs with autoregressive (AR) initializ…
arXiv cs.CL TIER_1 English(EN) · Shubham Parashar, Atharv Chagi, Jacob Helwig, Lakshmi Jotsna, Sushil Vemuri, James Caverlee, Dileep Kalathil, Shuiwang Ji · 2026-05-25 04:00

Learnability-Informed Fine-Tuning of Diffusion Language Models

arXiv:2605.22939v1 Announce Type: new Abstract: We aim to improve the reasoning capabilities of diffusion language models (DLMs). While SFT is a popular post-training recipe for autoregressive models, its use in DLMs faces challenges and can even hurt performance, though the unde…
arXiv cs.LG TIER_1 English(EN) · Chen-Hao Chao, Wei-Fang Sun, Junwei Quan, Chun-Yi Lee, Rahul G. Krishnan · 2026-05-22 04:00

MDM-Prime-v2：二进制编码和索引混洗实现扩散语言模型的扩展

arXiv:2603.16077v3 Announce Type: replace Abstract: Masked diffusion models (MDM) exhibit superior generalization when learned using a Partial masking scheme (Prime). This approach converts tokens into sub-tokens and models the diffusion process at the sub-token level. We identif…
arXiv cs.CL TIER_1 English(EN) · Shuiwang Ji · 2026-05-21 18:16

Learnability-Informed Fine-Tuning of Diffusion Language Models

We aim to improve the reasoning capabilities of diffusion language models (DLMs). While SFT is a popular post-training recipe for autoregressive models, its use in DLMs faces challenges and can even hurt performance, though the underlying causes remain understudied. Our analysis …
arXiv cs.CL TIER_1 English(EN) · Liqiang Nie · 2026-05-20 07:06

PulseCol：周期性刷新列稀疏注意力以加速扩散语言模型

Inference in diffusion large language models (dLLMs) is computationally expensive, as full self-attention must be repeatedly executed at each step of the denoising process without KV cache. Recent sparse attention methods for dLLMs mitigate this cost via block-sparse computation,…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 12:01

通过块近似稀疏注意力在扩散语言模型中实现高效长上下文建模

Diffusion Language Models (DLMs) enable globally coherent, bidirectional, and controllable text generation, offering advantages over traditional autoregressive LLMs, while scaling to ultra-long sequences remains costly. Many existing block-sparse attention methods select blocks b…
arXiv cs.CL TIER_1 English(EN) · Naoaki Okazaki · 2026-05-19 07:22

离散扩散语言模型的精炼目标漂移

Discrete diffusion language models (DDLMs) generate text by iteratively denoising categorical token sequences, while recent drifting methods for continuous generators suggest that part of this sampling-time correction can instead be absorbed into training through an anti-symmetri…
arXiv cs.CL TIER_1 English(EN) · James Kwok · 2026-05-15 06:56

用于扩散语言模型的动态分块

Block discrete diffusion language models factorize a sequence autoregressively over fixed-size positional blocks, decoupling within-block parallel denoising from across-block conditioning. We argue that this rigid partition wastes structure already present in the sequence: blocks…
arXiv cs.CV TIER_1 English(EN) · Jiaya Jia · 2026-05-19 12:01

通过块近似稀疏注意力在扩散语言模型中实现高效长上下文建模

Diffusion Language Models (DLMs) enable globally coherent, bidirectional, and controllable text generation, offering advantages over traditional autoregressive LLMs, while scaling to ultra-long sequences remains costly. Many existing block-sparse attention methods select blocks b…
arXiv stat.ML TIER_1 English(EN) · Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani, John Thickstun · 2026-05-19 04:00

连续扩散模型在语言任务上可与离散扩散模型竞争

arXiv:2605.18530v1 Announce Type: cross Abstract: While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based …
arXiv stat.ML TIER_1 English(EN) · John Thickstun · 2026-05-18 15:15

连续扩散模型在语言生成方面可与离散扩散模型媲美

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and con…

报道来源 [11]

相关实体

相关话题