A recent analysis evaluated eight local Large Language Models (LLMs) available through Ollama, focusing on their cost-effectiveness per correct answer, measured by GPU energy consumption. The Gemma 4:26b model emerged as the most efficient, achieving 96.9% accuracy at a cost of €0.013 per 1,000 correct answers. Conversely, the Qwen 3:8b-fp16 model was the most expensive, costing €0.239 per 1,000 correct answers with a lower accuracy of 66.7%. The study found that larger models and higher precision did not necessarily translate to better value, and that "reasoning" or "thinking" modes, while consuming more energy, did not improve accuracy on deterministic tasks. AI
IMPACT Provides a cost-per-performance metric for local LLM deployment, guiding users toward efficient hardware and model choices.
RANK_REASON Analysis of local LLM performance and cost-effectiveness. [lever_c_demoted from research: ic=1 ai=1.0]
- Gemma 3:1b
- Gemma 3:27b
- Gemma 4:26b
- Ollama
- Qwen 3:30b (MoE)
- Qwen 3:8b
- Qwen 3:8b-fp16
- Qwen 3:8b (Q4_K_M)
- Qwen 3:8b (Q8_0)
- RTX 3090
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →