Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 4h

StepFun 3.7 Flash MTP Bench Strix Halo

A user has benchmarked the StepFun Step-3.7-Flash model, a large language model with approximately 200 billion total parameters, on an AMD Ryzen AI Max+ 395 APU. The benchmark utilized a patched llama.cpp build with Vulkan/RADV support and a context size of 12,288 tokens. The results indicate that the Multi-Token Prediction (MTP) feature significantly boosts token generation speed by 27.5%, achieving 26.0 tokens/s, while prefill speed remained largely unchanged. This performance was achieved with lower power consumption compared to a non-MTP baseline. AI

IMPACT Demonstrates improved inference speed for large local models, potentially enabling more responsive AI applications on consumer hardware.

llama.cpp
AMD Ryzen AI Max+ 395
StepFun Step-3.7-Flash