PulseAugur
EN
LIVE 00:38:50

Step-3.7-Flash on AMD/ROCm faces context corruption and requires thinking budget

A user running the Step-3.7-Flash model on AMD hardware with ROCm has identified two key issues. First, ROCm appears to corrupt context windows beyond approximately 94,000 tokens, causing the model to loop and fail to produce usable answers, though Vulkan remains stable at longer contexts. Second, the model requires a hard 'thinking' token budget to prevent excessive processing and empty outputs, with a budget of 256 tokens proving effective for classification tasks without significant quality degradation. AI

IMPACT Users of Step-3.7-Flash on AMD hardware with ROCm should cap context windows below 94k tokens and implement a hard thinking budget for reliable performance.

RANK_REASON User-reported issues and configuration tips for a specific model and hardware/software combination.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/neuromacmd ·

    Step-3.7-Flash on AMD: ROCm corrupts long context past ~94k, and thinking needs a hard token budget

    <!-- SC_OFF --><div class="md"><p>Quick notes after running StepFun Step-3.7-Flash on AMD with ROCm.</p> <p>The two things that matter most:</p> <ol> <li><strong>Do not run ROCm past ~94k context.</strong> On my setup, ROCm corrupts long context somewhere around 94k tokens. The m…