English(EN) Another shout out to llama.cpp build b9455 2x3090

llama.cpp build b9455 在 Qwen3.6-27B 上实现 70+ tokens/sec

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-03 05:05

Reddit 的 r/LocalLLaMA 社区的一位用户分享了使用新版 llama.cpp（具体为 b9455 版本）取得的令人印象深刻的性能提升。该更新版本结合了跨两块 RTX 3090 GPU 的张量拆分，在使用 Qwen3.6-27B-UD-Q8_K_XL 模型时，实现了每秒超过 70 个 token 的速度。这显著超过了之前每秒 30-50 个 token 的速度范围，并达到了之前仅在 vLLM 上才能看到的性能。 AI

影响 llama.cpp 的此次更新显著提高了本地 LLM 部署的推理速度，有望使更复杂的模型在消费级硬件上高效运行。

排序理由开源推理引擎的用户分享的基准测试结果。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

llama.cpp build b9455 在 Qwen3.6-27B 上实现 70+ tokens/sec

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Fabulous_Fact_606 · 2026-06-03 05:05

Another shout out to llama.cpp build b9455 2x3090

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tvff62/another_shout_out_to_llamacpp_build_b9455_2x3090/"> <img alt="Another shout out to llama.cpp build b9455 2x3090" src="https://preview.redd.it/xyvtkzwr005h1.png?width=140&height=95&auto=webp&amp…

报道来源 [1]

Another shout out to llama.cpp build b9455 2x3090

相关实体

相关话题