Brief · PulseAugur

FRONTIER RELEASE · Google DeepMind English(EN) · 2d · [9 sources]

DiffusionGemma: 4x faster text generation

Google DeepMind has released DiffusionGemma, an experimental open-source model that generates text using a diffusion process rather than sequential token-by-token generation. This approach allows for significantly faster text output, up to four times quicker on GPUs, by processing entire blocks of text simultaneously. While the output quality is lower than traditional autoregressive models like Gemma 4, DiffusionGemma is optimized for speed-critical, interactive local workflows and fits within 18GB of VRAM when quantized. AI

IMPACT Accelerates local inference for interactive AI applications by enabling significantly faster text generation.

arXiv
Diffusion-Based Text-to-Microstructure Generation
Reddit
DiffusionGemma
NVIDIA GeForce RTX 5090
Gemma 4
Gemini Diffusion
Unsloth
NVIDIA H100
Apache 2.0
Google DeepMind
Hugging Face