PulseAugur
EN
LIVE 21:21:07

Gemma 4:26b-a4b-it-qat model achieves 15 tokens/sec on consumer GPU

A user reported that the gemma4:26b-a4b-it-qat model achieved a speed of 15 tokens per second on an Nvidia 4070 GPU with 8GB VRAM and 16GB RAM. This performance, running on Windows 11, was noted to be nearly as fast as a 12B model, surprising the user with its efficiency. AI

IMPACT Demonstrates efficient performance of smaller models on consumer hardware, potentially lowering barriers to entry for AI experimentation.

RANK_REASON User report on model performance on consumer hardware.

Read on Mastodon — fosstodon.org →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Gemma 4:26b-a4b-it-qat model achieves 15 tokens/sec on consumer GPU

COVERAGE [1]

  1. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Holy crap y'all, gemma4:26b-a4b-it-qat runs at 15 tokens per second on this Nvidia 4070, 8 GB VRAM, 16 GB RAM, Windows 11, almost as fast as the 12B model! What

    Holy crap y'all, gemma4:26b-a4b-it-qat runs at 15 tokens per second on this Nvidia 4070, 8 GB VRAM, 16 GB RAM, Windows 11, almost as fast as the 12B model! What sourcery is this? # ai # ollama