Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 5h

DiffusionGemma 26b on a 4090 at up to 475t/s... and some thoughts...

A user on Reddit shared their experience running the DiffusionGemma 26B model on a 4090 GPU, achieving speeds between 290-700 tokens/second. However, they found the model to be single-user only, less accurate than standard Gemma models, and prone to context fading. The user concluded that the model is not worth the effort, as a regular 26B model running through llama.cpp offers better performance and accuracy. AI

IMPACT This model's performance issues suggest limited utility for general users despite high theoretical speeds.

4090
DiffusionGemma 26B