DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs.
Google DeepMind has released DiffusionGemma, an experimental open model that generates text in parallel blocks rather than token-by-token. This novel approach allows for up to four times faster output on dedicated GPUs, enabling real-time self-correction and complex formatting. The model, based on the Gemma 4 architecture, utilizes a Mixture of Experts design with 26 billion parameters, activating approximately 3.8 billion during inference. AI
IMPACT Accelerates text generation tasks by enabling parallel processing and self-correction, potentially impacting content creation and complex data analysis.