DiffusionGemma 26b on a 4090 at up to 475t/s... and some thoughts...
A user on Reddit shared their experience running the DiffusionGemma 26B model on a 4090 GPU, achieving speeds between 290-700 tokens/second. However, they found the model to be single-user only, less accurate than standard Gemma models, and prone to context fading. The user concluded that the model is not worth the effort, as a regular 26B model running through llama.cpp offers better performance and accuracy. AI
IMPACT This model's performance issues suggest limited utility for general users despite high theoretical speeds.