Thomas Bley has released new slides detailing how to run large language models locally. The presentation covers multi-token prediction using the Qwen3.6 35B-A3B model with Nextn quantization. It also includes information on speech recognition with Qwen-3-ASR, which now functions with Llama.cpp. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a guide for local execution of open-source LLMs and ASR models, enabling broader experimentation and use.
RANK_REASON The cluster describes a technical presentation and guide for running open-source models locally, which falls under research and development. [lever_c_demoted from research: ic=1 ai=1.0]