Gemma 4 QAT models benchmarked on AMD Strix Halo APU

By PulseAugur Editorial · [1 sources] · 2026-06-06 14:22

A user benchmarked Google's Gemma 4 models, specifically the quantization-aware training (QAT) versions, on an AMD Strix Halo APU. The tests utilized llama.cpp with Vulkan/RADV backend to evaluate performance across different model sizes, including 12B, 26B, and 31B parameters. The user detailed the host system specifications and the process of converting and loading QAT assistant heads for optimal performance. AI

IMPACT Provides performance data for running Gemma 4 models on consumer-grade AMD hardware, informing potential deployment strategies.

RANK_REASON User-generated benchmark of an existing model on specific hardware. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/westsunset · 2026-06-06 14:22

Gemma 4 QAT Q4_0 Bench on Strix Halo

<div class="md"><h1>Gemma 4 QAT Q4_0 Bench on Strix Halo</h1> <p>These are Google's official Gemma 4 QAT Q4_0 GGUF models, served locally through llama.cpp Vulkan/RADV on a Strix Halo APU.</p> <p>QAT means <strong>quantization-aware training</strong>. Instead of ta…

COVERAGE [1]

Gemma 4 QAT Q4_0 Bench on Strix Halo

RELATED ENTITIES

RELATED TOPICS