PulseAugur
EN
LIVE 00:49:09

User seeks performance boost for Qwen3.5-122B on consumer hardware

A user is seeking to optimize the performance of the Qwen3.5-122B large language model on their hardware, which includes 32GB of VRAM and 64GB of RAM. They are currently experiencing token generation speeds between 6 and 20 tokens per second and are looking for ways to improve this throughput. The user has shared their specific command-line arguments and output logs to help diagnose the issue and find potential solutions. AI

IMPACT This query highlights the ongoing challenges and community efforts in optimizing large language models for efficient local deployment on consumer-grade hardware.

RANK_REASON User query about optimizing performance for a specific LLM on consumer hardware.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

User seeks performance boost for Qwen3.5-122B on consumer hardware

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/BitGreen1270 ·

    Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM?

    <!-- SC_OFF --><div class="md"><p>My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further?</p> <p><code> build/bin/llama-server \ -m ~/myp/models/unsloth/qwen3.5…