Ollama has released version 0.31.1, which significantly improves the performance of Gemma 4 on Apple Silicon. The update leverages multi-token prediction (MTP) to achieve nearly 90% faster token generation on average, particularly noted in a coding-agent benchmark. This optimization aims to enhance the user experience for running AI models locally. AI
IMPACT This update enhances the local execution speed of AI models on Apple hardware, potentially improving developer workflows and accessibility.
RANK_REASON Software release for an AI model runner, not a frontier model release.
Read on Mastodon — fosstodon.org →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →