Gemma-4-31B model hits 463K tokens/sec on TPU v6e-4 benchmarks

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 16:57

A performance report details the Gemma-4-31B model's capabilities on Cloud TPU v6e-4 hardware, achieving a peak prefill throughput of 463,345 tokens/sec. The benchmarks indicate that the dense 31B model offers comparable throughput to a 26B MoE model on the same hardware, with better latency for interactive tasks. However, the MoE model demonstrates superior compute efficiency and can handle much larger context windows. AI

影响 Demonstrates hardware-software co-optimization for dense models, offering insights into performance trade-offs against MoE architectures.

排序理由 This is a performance report and benchmark analysis of a specific model on particular hardware, not a new model release or significant industry event. [lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

Gemma-4-31B model hits 463K tokens/sec on TPU v6e-4 benchmarks

报道来源 [1]

dev.to — LLM tag TIER_1 Deutsch(DE) · xbill · 2026-05-08 16:57

Gemma-4-31B on v6e-4 TPU Benchmarks

<p><em>This is a submission for the <a href="https://dev.to/challenges/google-gemma-2026-05-06">Gemma 4 Challenge: Build with Gemma 4</a></em></p> <p>model: Gemma-4-31B</p> <h1> 🚀 Gemma 4 TPU v6e-4 Performance Report </h1> <h2> 📋 Deployment Overview </h2> <ul> <li> <strong>Model:…

报道来源 [1]

Gemma-4-31B on v6e-4 TPU Benchmarks

相关实体

相关话题