PulseAugur
EN
LIVE 10:49:14

Dual Radeon R9700 GPUs power Qwen 3.6 27B model with high throughput

A user shared their experience setting up and testing the Qwen 3.6 27B model on a dual Radeon R9700 GPU configuration using llama.cpp. The setup achieved impressive token generation speeds, reaching up to 67 tokens/s with a context of 10-13k and over 40 tokens/s with a context of 125k. Prefill throughput was also strong, with over 1,000 tokens/s for prompts under 10k and around 400 tokens/s for larger prompts exceeding 100k. The user detailed their hardware, software, and testing methodologies, including performance metrics for decode and prefill throughput, and discussed prompt caching strategies. AI

IMPACT Demonstrates efficient multi-GPU inference for large language models on consumer hardware, potentially lowering barriers to entry for advanced AI tasks.

RANK_REASON User-generated report on running a specific model with specific hardware, including performance metrics. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Dual Radeon R9700 GPUs power Qwen 3.6 27B model with high throughput

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Kal-LZ ·

    2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

    <!-- SC_OFF --><div class="md"><p>There isn't much information around about multi-GPU setups with the R9700, so I'm writing this up in case it helps anyone in the same situation. Here's my setup, the tests I ran, and the numbers from the server logs.</p> <h2>Setup</h2> <ul> <li>T…