Google DeepMind releases DiffusionGemma 26B multimodal model

By PulseAugur Editorial · [1 sources] · 2026-06-11 03:28

Google DeepMind has released DiffusionGemma 26B A4B IT, an open-weights multimodal generative model capable of processing text, image, and video inputs to produce text output. Built on a Gemma 4 26B A4B Mixture-of-Experts architecture, it features 25.2 billion total parameters with 3.8 billion active parameters. The model supports a 256K token context window, multilingual inference across over 35 languages, and can generate over 1,100 tokens per second on NVIDIA H100 GPUs. AI

IMPACT Accelerates multimodal AI development with an open-weights model supporting text, image, and video inputs.

RANK_REASON This is a new model release from a frontier lab (Google DeepMind). [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Google DeepMind releases DiffusionGemma 26B multimodal model

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/pmttyji · 2026-06-11 03:28

nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1u2np0a/nvidiadiffusiongemma26ba4bitnvfp4_hugging_face/"> <img alt="nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face" src="https://external-preview.redd.it/9EBAZR2owX7nOGzlCDPXd-p_xWnW0WyWy3qPB046G1s.png?…

COVERAGE [1]

nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face

RELATED ENTITIES

RELATED TOPICS