StepFun 3.7 Flash 模型实现 27.5% 的更快 token 生成速度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-06 14:48

一位用户在使用 AMD Ryzen AI Max+ 395 APU 对约 2000 亿总参数的大型语言模型 StepFun Step-3.7-Flash 模型进行了基准测试。该基准测试使用了支持 Vulkan/RADV 的补丁 llama.cpp 版本，上下文大小为 12,288 tokens。结果表明，多 token 预测 (MTP) 功能显著提高了 27.5% 的 token 生成速度，达到 26.0 tokens/s，而预填充速度基本保持不变。与非 MTP 基线相比，在较低的功耗下实现了这一性能。 AI

影响展示了大型本地模型推理速度的提高，有可能在消费级硬件上实现更具响应性的 AI 应用。

排序理由用户对特定模型版本及其性能特征的基准测试。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/westsunset · 2026-06-06 14:48

StepFun 3.7 Flash MTP Bench Strix Halo

<div class="md"><p>This is the StepFun Step-3.7-Flash <code>UD-IQ4_XS</code> main model with the official StepFun MTP <code>Q8_0</code> draft model, served through a patched llama.cpp Vulkan/RADV build.</p> <h1>Host</h1> <ul> <li>System: AMD Ryzen AI Max+ 395 / Rad…

报道来源 [1]

StepFun 3.7 Flash MTP Bench Strix Halo

相关实体

相关话题