Thomas Bley has released new slides detailing how to run large language models locally. The presentation covers multi-token prediction using the Qwen3.6 35B-A3B model with Nextn quantization. It also includes information on speech recognition with Qwen-3-ASR, which now functions with Llama.cpp. AI
IMPACT Provides a guide for local execution of open-source LLMs and ASR models, enabling broader experimentation and use.
RANK_REASON The cluster describes a technical presentation and guide for running open-source models locally, which falls under research and development. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Mastodon — fosstodon.org →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →