PulseAugur
EN
LIVE 12:51:58

Gemma4-26B beats Qwen3.6-35B in speed despite slower token output

A user compared the performance of Qwen3.6-35B and Gemma4-26B on a Radeon 7900 XTX GPU, finding that Gemma4-26B was approximately 20% faster in end-to-end task completion despite Qwen3.6-35B having a significantly faster token emission rate. This difference is attributed to Qwen generating roughly twice as many tokens, including internal reasoning steps, to answer prompts. The user concluded that Qwen is better suited for batch processing due to its decode speed, while Gemma is preferable for latency-sensitive single requests, highlighting that total token count to answer a prompt is more critical than raw token-per-second speed when reasoning is involved. AI

IMPACT Gemma4-26B offers faster end-to-end task completion than Qwen3.6-35B, suggesting token generation efficiency is key for latency-sensitive applications.

RANK_REASON User benchmark comparing two specific models on specific hardware. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 Deutsch(DE) · /u/IvGranite ·

    Qwen3.6-35B vs Gemma4-26B on 7900 XTX

    <!-- SC_OFF --><div class="md"><p>Ran a fair comparison between Qwen3.6-35B-A3B and Gemma4-26B-A4B on my Radeon 7900 XTX. Both reasoning-enabled at matching 32K budgets, no output caps, six generic real-world prompts (meeting notes, incident postmortem, log triage to JSON, code r…