Qwen 3.6 35B model runs on consumer hardware with 32k context

By PulseAugur Editorial · [1 sources] · 2026-06-05 03:29

A user on Reddit shared their experience running the Qwen 3.6 35B model on a consumer-grade setup, including an RTX 3080 GPU and 32GB of RAM. They achieved a throughput of 26 tokens/second for generation and 1400 tokens/second for processing at a 32k context length. While offloading the KV cache to the GPU boosted generation speed to 56 tokens/second, it limited the context window, which was not suitable for their agentic work involving deep research and document processing. AI

IMPACT Provides a performance data point for running large models locally, informing users about achievable speeds and context lengths on consumer-grade GPUs.

RANK_REASON User-shared benchmark and configuration details for a specific model on consumer hardware. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 Deutsch(DE) · /u/AndreVallestero · 2026-06-05 03:29

Qwen 3.6 35B on RTX 3080 10GB + 7700X + 32GB DDR5

<div class="md">Environment: <ul> <li>GPU: RTX 3080 10GB</li> <li>CPU: Ryzen 7 7700x</li> <li>RAM: 32GB 6000mt/s</li> <li>OS: CachyOS</li> <li>engine: ik_llamacpp cuda llama-server \ --model "Qwen3.6-35B-A3B-UD-Q4_K_S.gguf" \ --n-gpu-laye…

COVERAGE [1]

Qwen 3.6 35B on RTX 3080 10GB + 7700X + 32GB DDR5

RELATED ENTITIES

RELATED TOPICS