PulseAugur / Brief
EN
LIVE 22:09:47

Brief

last 24h
[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 %

    A user on r/LocalLLaMA has shared benchmarks comparing two quantized versions of the Qwen 3.6 27B model: Qwen3.6-27B-UD-Q8_K_XL and Qwen3.6-27B-Q8-CC. The user developed a custom quantization method, focusing on layers with high outlier values post-quantization, aiming to improve performance. Initial results suggest the custom-quantized version (Qwen3.6-27B-Q8-CC) may offer slightly better performance in terms of KLD and Delta P metrics, despite being smaller in file size. AI

    Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 %

    IMPACT Custom quantization techniques may offer performance gains for locally run LLMs.

  2. Another shout out to llama.cpp build b9455 2x3090

    A user on Reddit's r/LocalLLaMA community shared impressive performance gains using a new build of llama.cpp, specifically version b9455. This updated version, when combined with tensor splitting across two RTX 3090 GPUs, achieved over 70 tokens per second with the Qwen3.6-27B-UD-Q8_K_XL model. This significantly surpasses previous speeds, which were in the 30-50 tokens per second range, and matches the performance previously only seen with vLLM. AI

    Another shout out to llama.cpp build b9455 2x3090

    IMPACT This update to llama.cpp significantly boosts inference speed for local LLM deployments, potentially enabling more complex models to run efficiently on consumer hardware.