PulseAugur
EN
LIVE 22:06:41

llama.cpp build b9455 achieves 70+ tokens/sec on Qwen3.6-27B

A user on Reddit's r/LocalLLaMA community shared impressive performance gains using a new build of llama.cpp, specifically version b9455. This updated version, when combined with tensor splitting across two RTX 3090 GPUs, achieved over 70 tokens per second with the Qwen3.6-27B-UD-Q8_K_XL model. This significantly surpasses previous speeds, which were in the 30-50 tokens per second range, and matches the performance previously only seen with vLLM. AI

IMPACT This update to llama.cpp significantly boosts inference speed for local LLM deployments, potentially enabling more complex models to run efficiently on consumer hardware.

RANK_REASON User-shared benchmark results for an open-source inference engine. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

llama.cpp build b9455 achieves 70+ tokens/sec on Qwen3.6-27B

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Fabulous_Fact_606 ·

    Another shout out to llama.cpp build b9455 2x3090

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tvff62/another_shout_out_to_llamacpp_build_b9455_2x3090/"> <img alt="Another shout out to llama.cpp build b9455 2x3090" src="https://preview.redd.it/xyvtkzwr005h1.png?width=140&amp;height=95&amp;auto=webp&amp…