A user has successfully run the Qwen 3.6-35B-A3B model on an Intel Arc B70 Pro GPU, achieving impressive performance metrics. The setup utilized llama.cpp with SYCL backend, yielding a prompt processing speed of 977 tokens per second and supporting a context window of 262,000 tokens. This configuration has enabled the user to develop a functional poker game without encountering issues like model loops or crashes. AI
IMPACT Demonstrates high performance for local LLM inference on consumer GPUs, potentially lowering barriers to entry for advanced AI applications.
RANK_REASON User-reported benchmark and setup for a specific model on consumer hardware. [lever_c_demoted from research: ic=1 ai=0.7]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →