Gemma 4 12B runs on old GTX 1080 Ti, Q8 quantization fixes errors

By PulseAugur Editorial · [1 sources] · 2026-06-05 04:41

A user detailed their experience running Google's new Gemma 4 12B model on an older GTX 1080 Ti GPU. They found that the Q4 quantization level achieved a usable speed of around 28 tokens/sec for chat and drafting, fitting within the 8GB VRAM of a single card. However, for more detailed tasks like bioinformatics, the Q4 version produced visible glitches and factual errors, which were resolved by using the Q8 quantization level, albeit at a slower speed and requiring two GPUs. AI

IMPACT Demonstrates that newer, smaller models can be run on older hardware for basic tasks, though higher quantization is needed for accuracy.

RANK_REASON User-level evaluation of a new model on older hardware. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · byeongsoo kang · 2026-06-05 04:41

Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

<h2> TL;DR (Quick Answer) </h2> <p>Gemma 4 12B just dropped, so I ran it on a <strong>GTX 1080 Ti</strong> (Pascal, 2017) to see what an 8-year-old card does with a 2026 model. Real numbers, and a few honest surprises:</p> <ul> <li> <strong>Speed: ~28 tok/s</strong> at Q4_K_M on …

COVERAGE [1]

Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

RELATED ENTITIES

RELATED TOPICS