A user has successfully run the Qwen3.6-35B-A3B, a 35 billion parameter mixture-of-experts model, on two 8-year-old NVIDIA GTX 1080 Ti graphics cards. The setup leverages CPU RAM for a significant portion of the model's weights, with only the active experts fitting into the combined 22GB of VRAM. This configuration achieves approximately 20 tokens per second, demonstrating that even older hardware can be viable for sparse MoE models with appropriate quantization and memory management techniques. AI
IMPACT Demonstrates that older, consumer-grade hardware can run large MoE models with careful optimization, potentially lowering the barrier to entry for experimentation.
RANK_REASON User benchmark of running a specific model on older hardware. [lever_c_demoted from research: ic=1 ai=0.7]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →