Brief · PulseAugur

TOOL · r/LocalLLaMA Deutsch(DE) · 1d

Qwen3.6-35B vs Gemma4-26B on 7900 XTX

A user compared the performance of Qwen3.6-35B and Gemma4-26B on a Radeon 7900 XTX GPU, finding that Gemma4-26B was approximately 20% faster in end-to-end task completion despite Qwen3.6-35B having a significantly faster token emission rate. This difference is attributed to Qwen generating roughly twice as many tokens, including internal reasoning steps, to answer prompts. The user concluded that Qwen is better suited for batch processing due to its decode speed, while Gemma is preferable for latency-sensitive single requests, highlighting that total token count to answer a prompt is more critical than raw token-per-second speed when reasoning is involved. AI

IMPACT Gemma4-26B offers faster end-to-end task completion than Qwen3.6-35B, suggesting token generation efficiency is key for latency-sensitive applications.

llama.cpp
Qwen3.6-35B
Gemma4-26B
Radeon 7900 XTX