PulseAugur
EN
LIVE 21:59:13

Local LLM Costs Revealed: Smaller Models Cheaper Than Cloud, Larger Ones More Expensive

A controlled benchmark on a single machine with an RTX 3090 GPU measured the actual cost of running local Large Language Models (LLMs) in euros per million tokens. The results indicated that smaller models like Gemma 3:1B were significantly cheaper than hosted APIs, costing approximately €0.118 per million tokens. However, larger models such as Gemma 3:27B proved more expensive to run locally due to high energy consumption and lower throughput, costing €0.706 per million tokens, before accounting for hardware depreciation. AI

IMPACT Highlights that the cost-effectiveness of running LLMs locally is highly dependent on model size and hardware efficiency, challenging the assumption that local deployment is always cheaper.

RANK_REASON The item details a specific, reproducible benchmark and analysis of LLM operational costs, akin to a research finding. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Local LLM Costs Revealed: Smaller Models Cheaper Than Cloud, Larger Ones More Expensive

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Arsen Apostolov ·

    How Much Does It Actually Cost to Run a Local LLM? (€ per Million Tokens, Measured)

    <p>"It runs on my own GPU, so it's basically free." I believed that until I put a meter on it. So I ran a controlled benchmark on one box — an openSUSE machine with a single RTX 3090 — driving three local models through ollama under an identical fixed workload (256-token generati…