PulseAugur
EN
LIVE 12:17:25

Qwen 3.6-35B-A3B model achieves 977 tk/s on Intel Arc GPU

A user has successfully run the Qwen 3.6-35B-A3B model on an Intel Arc B70 Pro GPU, achieving impressive performance metrics. The setup utilized llama.cpp with SYCL backend, yielding a prompt processing speed of 977 tokens per second and supporting a context window of 262,000 tokens. This configuration has enabled the user to develop a functional poker game without encountering issues like model loops or crashes. AI

IMPACT Demonstrates high performance for local LLM inference on consumer GPUs, potentially lowering barriers to entry for advanced AI applications.

RANK_REASON User-reported benchmark and setup for a specific model on consumer hardware. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Qwen 3.6-35B-A3B model achieves 977 tk/s on Intel Arc GPU

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/Atomynos_Atom ·

    Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tukrtf/qwen_3635ba3b_with_977_tks_prompt_processing_and/"> <img alt="Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro" src="https://external-preview.redd.it/o_M4YH…