PulseAugur
EN
LIVE 21:30:56

35B MoE model runs on dual 1080 Ti GPUs with CPU RAM assist

A user has successfully run the Qwen3.6-35B-A3B, a 35 billion parameter mixture-of-experts model, on two 8-year-old NVIDIA GTX 1080 Ti graphics cards. The setup leverages CPU RAM for a significant portion of the model's weights, with only the active experts fitting into the combined 22GB of VRAM. This configuration achieves approximately 20 tokens per second, demonstrating that even older hardware can be viable for sparse MoE models with appropriate quantization and memory management techniques. AI

IMPACT Demonstrates that older, consumer-grade hardware can run large MoE models with careful optimization, potentially lowering the barrier to entry for experimentation.

RANK_REASON User benchmark of running a specific model on older hardware. [lever_c_demoted from research: ic=1 ai=0.7]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · byeongsoo kang ·

    Running a 35B MoE (Qwen3.6-35B-A3B) on 2x GTX 1080 Ti in 2026 — Real Benchmarks, and Does the Second GPU Actually Help?

    <h2> TL;DR (Quick Answer) </h2> <p>I actually ran <strong>Qwen3.6-35B-A3B</strong> — a 35B-parameter mixture-of-experts model (only 3B active per token) — on a pair of <strong>8-year-old GTX 1080 Ti</strong> cards (22 GB combined). Real, measured numbers:</p> <ul> <li> <strong>Ge…