Svenska(SV) unsloth vs bartowski MTP ggufs

Unsloth 对决 Bartowski：Qwen 模型 MTP 性能基准测试

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-01 08:32

一位 Reddit r/LocalLLaMA 用户对 Unsloth 和 Bartowski 实现 MTP（多任务提示）技术在 Qwen 3.5-4B 和 9B 模型上的性能进行了比较。比较重点关注了不同量化级别（Q4_0、IQ4_NL、Q4_1、Q8_0）下的 VRAM 使用量和每秒 token 数。虽然两种实现都显示出相似的性能，但在某些测试中，Unsloth 通常使用的 VRAM 略少，吞吐量略高。 AI

影响为优化本地 LLM 部署的用户提供了实用的性能数据。

排序理由用户进行的基准测试，比较了开源模型的两种技术实现。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 Svenska(SV) · /u/Ok_Warning2146 · 2026-06-01 08:32

unsloth vs bartowski MTP ggufs

<div class="md"><p>I noticed that bartowski's MTP ggufs are bigger than unsloth. I asked bartowski and he said he used Q8_0 quant for the MTP head. So I compare the decoding performance of the two.</p> <p>/build/bin/llama-server -m ~/gguf/Qwen3.5-4B-Q4_0.gguf --hos…

报道来源 [1]

unsloth vs bartowski MTP ggufs

相关实体

相关话题