双 Radeon R9700 GPU 为 Qwen 3.6 27B 模型提供高吞吐量

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-21 14:35

一位用户分享了在 llama.cpp 上使用双 Radeon R9700 GPU 配置设置和测试 Qwen 3.6 27B 模型的经验。该设置实现了令人印象深刻的 token 生成速度，在 10-13k 的上下文下达到 67 tokens/s，在 125k 的上下文下超过 40 tokens/s。Prefill 吞吐量也很强劲，对于低于 10k 的提示，超过 1,000 tokens/s，对于超过 100k 的较大提示，约为 400 tokens/s。用户详细介绍了他们的硬件、软件和测试方法，包括 decode 和 prefill 吞吐量的性能指标，并讨论了提示缓存策略。 AI

影响展示了在消费级硬件上运行大型语言模型的高效多 GPU 推理能力，可能降低高级 AI 任务的入门门槛。

排序理由用户生成的关于使用特定硬件运行特定模型的报告，包括性能指标。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

双 Radeon R9700 GPU 为 Qwen 3.6 27B 模型提供高吞吐量

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Kal-LZ · 2026-06-21 14:35

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

<div class="md"><p>There isn't much information around about multi-GPU setups with the R9700, so I'm writing this up in case it helps anyone in the same situation. Here's my setup, the tests I ran, and the numbers from the server logs.</p> <h2>Setup</h2> <ul> <li>T…

报道来源 [1]

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

相关实体

相关话题