A user running the Step-3.7-Flash model on AMD hardware with ROCm has identified two key issues. First, ROCm appears to corrupt context windows beyond approximately 94,000 tokens, causing the model to loop and fail to produce usable answers, though Vulkan remains stable at longer contexts. Second, the model requires a hard 'thinking' token budget to prevent excessive processing and empty outputs, with a budget of 256 tokens proving effective for classification tasks without significant quality degradation. AI
IMPACT Users of Step-3.7-Flash on AMD hardware with ROCm should cap context windows below 94k tokens and implement a hard thinking budget for reliable performance.
RANK_REASON User-reported issues and configuration tips for a specific model and hardware/software combination.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →