PulseAugur / Brief
EN
LIVE 17:58:39

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Llama bench and real performance wayy different(Help)

    A user on Reddit's r/LocalLLaMA subreddit is experiencing a significant discrepancy between benchmarked performance and real-world generation speed for the Qwen 3.6-35B-A3B IQ4_XS model. While benchmarks indicate high token-per-second rates for both prompt evaluation and generation, actual usage shows much slower performance, with a prompt evaluation of 7.79 ms per token (128.30 tokens/sec) and generation at 125.31 ms per token (7.98 tokens/sec). The user is seeking assistance to identify potential misconfigurations or issues with their setup, which includes an NVIDIA GeForce RTX 4060 Laptop GPU with 8GB VRAM and 16GB RAM, and is running a specific llama server configuration. AI

    IMPACT Highlights potential issues in local LLM deployment and performance tuning.