A user reported that the gemma4:26b-a4b-it-qat model achieved a speed of 15 tokens per second on an Nvidia 4070 GPU with 8GB VRAM and 16GB RAM. This performance, running on Windows 11, was noted to be nearly as fast as a 12B model, surprising the user with its efficiency. AI
IMPACT Demonstrates efficient performance of smaller models on consumer hardware, potentially lowering barriers to entry for AI experimentation.
RANK_REASON User report on model performance on consumer hardware.
Read on Mastodon — fosstodon.org →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →