English(EN) Cheapest setup for >10 tok/sec for 120B dense LLM

用户寻求运行快速 120B LLM 的最便宜硬件

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-09 08:17

r/LocalLLaMA subreddit 上的一位用户正在寻找最具成本效益的硬件配置，以在超过 10 个 token/秒的速度下运行一个 1200 亿参数的密集大型语言模型 (LLM)。用户需要此配置来为角色扮演游戏战役生成快速响应，理想情况下具有 64,000 token 的上下文窗口和量化模型精度 (Q5 或 Q6)。他们正在探索纯 CPU、纯 GPU 和混合推理设置的选项，并指出了基于 GPU 的解决方案对 VRAM 的显著要求。 AI

排序理由这是一个关于在本地运行 LLM 的特定硬件设置的用户问题，而不是重大的行业公告或发展。

在 r/LocalLLaMA 阅读 →

基础设施

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/TrainingTwo1118 · 2026-06-09 08:17

Cheapest setup for >10 tok/sec for 120B dense LLM

<div class="md"><p>Hi all, I'm trying to wrap my head around hardware variables when it comes to LLM, and I have another question: what would be the cheapest way to run a 120B <strong>dense</strong> LLM at >10 tok/sec? I'm fine with Q5, ideally Q6 though.</p> <p…

报道来源 [1]

Cheapest setup for >10 tok/sec for 120B dense LLM

相关实体

相关话题