Brief · PulseAugur

COMMENTARY · r/LocalLLaMA English(EN) · 1w

How do I improve my T/S

A user on the r/LocalLLaMA subreddit is seeking advice on how to improve the inference speed of their local large language model setup. Despite having a laptop with a powerful RTX 5070 Ti GPU (12GB VRAM), 32GB RAM, and a high-end Intel Core Ultra 9 processor, they are only achieving 37 tokens per second with the Qwen3.6-35B-A3B-Q6_K_P model. They have experimented with various command-line arguments for llama.cpp, including different quantization levels and context sizes, but have not found significant improvements. AI

IMPACT Users running local LLMs may face similar performance challenges and can learn from the advice shared in this discussion.

llama.cpp
r/LocalLLaMA
Pi agent
RTX 5070 Ti
Intel Core Ultra 9 275HX
Qwen3.6-35B-A3B-Q6_K_P