PulseAugur
EN
LIVE 13:35:54

Gemma 4 31B model performance questioned on dual 9060 XT GPUs

A user on Reddit's r/LocalLLaMA subreddit is seeking advice regarding the performance of the Gemma 4 31B Q6 model when run on dual 9060 XT graphics cards. The user reports a consistent inference speed of approximately 8-9 tokens per second, which they believe is slower than expected based on other discussions. They find the current speed usable but are looking for ways to improve it if they are overlooking any optimizations. AI

IMPACT Potential for improved local LLM inference speeds for users with similar hardware configurations.

RANK_REASON User-level discussion about optimizing a specific model on consumer hardware.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Gemma 4 31B model performance questioned on dual 9060 XT GPUs

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/beigepccase ·

    Gemma 4 31B Q6 on Dual 9060 XT

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1ucenk7/gemma_4_31b_q6_on_dual_9060_xt/"> <img alt="Gemma 4 31B Q6 on Dual 9060 XT" src="https://preview.redd.it/2ytvy9njms8h1.jpeg?width=320&amp;crop=smart&amp;auto=webp&amp;s=9e2294ed32265c1cc5b7b76961b4636e…