Local LLM Guide Updated with Gemma 4 Speed Boosts and Diagram Tools

By PulseAugur Editorial · [1 sources] · 2026-06-08 16:29

Thomas Bley has updated his "Run LLMs Locally" presentation with new examples and performance improvements. The update includes a demonstration of creating Mermaid diagrams within the llama.cpp UI and introduces Quantization-Aware Training (QAT) variants for Gemma 4, which reportedly achieve 50% faster token generation on local setups. Additionally, the presentation now clarifies definitions for deterministic and probabilistic results. AI

IMPACT Provides practical guidance and performance optimizations for running LLMs locally, potentially lowering barriers for developers.

RANK_REASON Update to a guide on running LLMs locally, including performance tweaks and new examples.

Read on Mastodon — mastodon.social →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-06-08 16:29

New week, new slides and small updates: Run LLMs Locally Added an example to create Mermaid diagrams in llama.cpp UI. Added QAT (Quantization-Aware Training) va

New week, new slides and small updates: Run LLMs Locally Added an example to create Mermaid diagrams in llama.cpp UI. Added QAT (Quantization-Aware Training) variants of Gemma 4 which are 50 percent faster in token generation with my local setup. Added definitions for Determinist…

LINKS codeberg.org/…/Run_LLMs_Locally_2026_Thom…

COVERAGE [1]

New week, new slides and small updates: Run LLMs Locally Added an example to create Mermaid diagrams in llama.cpp UI. Added QAT (Quantization-Aware Training) va

RELATED ENTITIES

RELATED TOPICS