Guide released for running Qwen and ASR models locally

By PulseAugur Editorial · [1 sources] · 2026-05-18 22:03

Thomas Bley has released new slides detailing how to run large language models locally. The presentation covers multi-token prediction using the Qwen3.6 35B-A3B model with Nextn quantization. It also includes information on speech recognition with Qwen-3-ASR, which now functions with Llama.cpp. AI

IMPACT Provides a guide for local execution of open-source LLMs and ASR models, enabling broader experimentation and use.

RANK_REASON The cluster describes a technical presentation and guide for running open-source models locally, which falls under research and development. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Mastodon — fosstodon.org →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-18 22:03

New week, new slides: Run LLMs Locally Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-

New week, new slides: Run LLMs Locally Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-ASR is now working directly with Llama.cpp and included in the slides. https:// codeberg.org/thbley/talks/raw/ branch/ma…

LINKS codeberg.org/…/Run_LLMs_Locally_2026_Thom…

COVERAGE [1]

New week, new slides: Run LLMs Locally Now including multi-token prediction using Qwen3.6 35B-A3B with Nextn quantization. Also speech recognition using Qwen-3-

RELATED ENTITIES

RELATED TOPICS