PulseAugur
EN
LIVE 03:39:20

Gemma 4:26b leads local LLMs in cost-efficiency per correct answer

A recent analysis evaluated eight local Large Language Models (LLMs) available through Ollama, focusing on their cost-effectiveness per correct answer, measured by GPU energy consumption. The Gemma 4:26b model emerged as the most efficient, achieving 96.9% accuracy at a cost of €0.013 per 1,000 correct answers. Conversely, the Qwen 3:8b-fp16 model was the most expensive, costing €0.239 per 1,000 correct answers with a lower accuracy of 66.7%. The study found that larger models and higher precision did not necessarily translate to better value, and that "reasoning" or "thinking" modes, while consuming more energy, did not improve accuracy on deterministic tasks. AI

IMPACT Provides a cost-per-performance metric for local LLM deployment, guiding users toward efficient hardware and model choices.

RANK_REASON Analysis of local LLM performance and cost-effectiveness. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Gemma 4:26b leads local LLMs in cost-efficiency per correct answer

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Arsen Apostolov ·

    How to Rank Local LLMs by Cost per Correct Answer (Measured GPU Energy, 8 Ollama Models)

    <p><strong>TL;DR:</strong> I priced 8 local Ollama models by <strong>€ per 1,000 correct answers</strong> — metered GPU energy ÷ correct answers, on one RTX 3090. <code>gemma4:26b</code> won at <strong>96.9% accuracy for €0.013/1k-correct</strong>. The most expensive model (<code…