English(EN) $1800 (in GPU cost running with P2P running Qwen/Qwen3.6-27b-FP8 with 262K context and BF16 KV cache at 55 tok/s

本地 LLM 设置在 1800 美元的 GPU 套件上实现 55 tok/s 和 262K 上下文

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-19 23:30

一位用户分享了他们本地运行 Qwen3.6-27B-FP8 模型的设置，实现了每秒 55 个 token 的速度，上下文窗口为 262K。该设置包括四块 16GB 5060 Ti GPU，启用了 P2P，GPU 硬件成本约为 1800 美元。此配置仅适用于推理，单用户应用程序。 AI

影响展示了使用消费级硬件运行大上下文窗口的本地推理性能。

排序理由用户分享的运行特定 LLM 的本地设置和性能指标。

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/joorklee · 2026-06-19 23:30

$1800 (in GPU cost running with P2P running Qwen/Qwen3.6-27b-FP8 with 262K context and BF16 KV cache at 55 tok/s

<div class="md">Hey peeps, wanted to share what is possible for folks with an inference only single user use case with 1700 in GPU cost. Setup: 4x 5060 ti (16GB) with P2P If you are in the US and you keep an eye on facebook marketp…