English(EN) Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

Ornith-1.0-35B GGUF 模型通过投机解码嫁接更新

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-28 18:35

Ornith-1.0-35B 模型的新版本，特别是 GGUF 格式，已通过原生多令牌预测（MTP）投机解码嫁接进行了更新。此次更新将单流解码速度提高了 1.3-1.35 倍，最高可达每秒 233.8 个令牌。该模型保持了 0.073 的低 Kullback–Leibler 散度（KLD），优于 Q4_K_M 量化，并为长上下文场景提供了改进的性能。 AI

影响增强了在消费级硬件上运行模型的用户的本地 LLM 性能和效率。

排序理由对现有开源模型的更新，具有性能改进和新功能。

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Blahblahblakha · 2026-06-28 18:35

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1ui4yn6/ornith1035b_gguf_update_native_mtp/"> <img alt="Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)" src="https://preview.redd.it/…

报道来源 [1]

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

相关实体

相关话题