Brief · PulseAugur

TOOL · Mastodon — mastodon.social English(EN) · 5h

Okay, so on my Lenovo laptop with Nvidia 4070 GPU, 8 GB VRAM, Gemma4:12b-it-qat runs at a good 13 tokens per second. And I can live with that. I mean, local AI

A user reported that the Gemma 4:12b-it-qat model runs at approximately 13 tokens per second on a Lenovo laptop equipped with an NVIDIA 4070 GPU and 8 GB of VRAM. This performance is considered acceptable for local AI applications, representing an improvement over previous, less capable models on the same hardware. The user also noted the utility of Ollama's cloud models, particularly its $20 per month plan which has not yet hit usage limits. AI

IMPACT Demonstrates increasing viability of running capable LLMs locally on consumer-grade hardware.

NVIDIA
Claude Opus
Ollama
Gemma
Lenovo
4070
4:12b-it-qat