DiffusionGemma 26B struggles with accuracy and context despite high speeds on 4090

By PulseAugur Editorial · [1 sources] · 2026-06-18 22:29

A user on Reddit shared their experience running the DiffusionGemma 26B model on a 4090 GPU, achieving speeds between 290-700 tokens/second. However, they found the model to be single-user only, less accurate than standard Gemma models, and prone to context fading. The user concluded that the model is not worth the effort, as a regular 26B model running through llama.cpp offers better performance and accuracy. AI

IMPACT This model's performance issues suggest limited utility for general users despite high theoretical speeds.

RANK_REASON User review of a specific model's performance on consumer hardware.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

DiffusionGemma 26B struggles with accuracy and context despite high speeds on 4090

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/teachersecret · 2026-06-18 22:29

DiffusionGemma 26b on a 4090 at up to 475t/s... and some thoughts...

<div class="md"><p>Figured I'd post up a bit of info for anyone else who was thinking about messing with this model on a 3090/4090.</p> <p>Obviously I can't use the nvfp4, but I got it up and running in vLLM using diffusiongemma-26B-A4B-it-AWQ-INT4. Had to run it i…

COVERAGE [1]

DiffusionGemma 26b on a 4090 at up to 475t/s... and some thoughts...

RELATED ENTITIES

RELATED TOPICS