DiffusionGemma: 4x faster text generation
Google DeepMind has released DiffusionGemma, an experimental open-source model that generates text using a diffusion process rather than sequential token-by-token generation. This approach allows for significantly faster text output, up to four times quicker on GPUs, by processing entire blocks of text simultaneously. While the output quality is lower than traditional autoregressive models like Gemma 4, DiffusionGemma is optimized for speed-critical, interactive local workflows and fits within 18GB of VRAM when quantized. AI
IMPACT Accelerates local inference for interactive AI applications by enabling significantly faster text generation.