English(EN) Qwen3.6-35B-A3B APEX on a Single RTX 3090 - Getting the Most Out of It

Qwen3.6-35B-A3B 模型针对单张 RTX 3090 GPU 进行优化

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-22 12:51

一位 Reddit 用户分享了他们在单张 RTX 3090 GPU 上优化 Qwen3.6-35B-A3B 模型的流程。他们的目标是在 128k 上下文窗口下实现最高质量和速度。基准测试表明，使用 `ik_llama` 引擎和 `I-Compact` APEX 模型可提供最快的生成速度，而使用 `spiritbuun` 引擎、`I-Quality` 和 TurboQuant 缓存则能提供相当的速度，且质量可能更高。`I-Quality` 模型表现出强大的性能指标，质量上与更高质量的基准模型非常接近，同时体积更小、速度更快，远超参考 BF16 模型。 AI

影响为在消费级硬件上高效部署大型语言模型提供了见解，可能降低了高级人工智能使用的门槛。

排序理由用户生成的关于在消费级硬件上优化特定模型的指南。

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

Qwen3.6-35B-A3B 模型针对单张 RTX 3090 GPU 进行优化

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/old-mike · 2026-06-22 12:51

Qwen3.6-35B-A3B APEX on a Single RTX 3090 - Getting the Most Out of It

<div class="md"><p>Resources I used: - <a href="https://github.com/ikawrakow/ik_llama.cpp">https://github.com/ikawrakow/ik_llama.cpp</a> - as the reference llama.cpp fork - <a href="https://github.com/spiritbuun/buun-llama-cpp">https://github.com/spiritbuun/buun-ll…

报道来源 [1]

Qwen3.6-35B-A3B APEX on a Single RTX 3090 - Getting the Most Out of It

相关实体

相关话题