Gemma 4 QAT Q4_0 Bench on Strix Halo
A user benchmarked Google's Gemma 4 models, specifically the quantization-aware training (QAT) versions, on an AMD Strix Halo APU. The tests utilized llama.cpp with Vulkan/RADV backend to evaluate performance across different model sizes, including 12B, 26B, and 31B parameters. The user detailed the host system specifications and the process of converting and loading QAT assistant heads for optimal performance. AI
IMPACT Provides performance data for running Gemma 4 models on consumer-grade AMD hardware, informing potential deployment strategies.