PulseAugur
EN
LIVE 12:27:57

42 LLMs benchmarked for speed: Smaller models often faster

An independent tracker named ollamatps.com has benchmarked 42 large language models (LLMs) to measure their actual response speed, distinguishing between Time to First Token (TTFT) and Tokens Per Second (TPS). The benchmark, developed by Anton, a former Apple engineer, uses a fixed prompt and output cap, with continuous re-testing to ensure reliability. Results indicate that model size does not correlate with speed, with smaller models often outperforming larger ones, and significant variations in TTFT, ranging up to 80 times slower for some models. AI

IMPACT Highlights the critical importance of speed metrics beyond raw intelligence for LLM deployment and user experience.

RANK_REASON Independent benchmark of multiple LLMs measuring speed metrics. [lever_c_demoted from research: ic=1 ai=1.0]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Anton Gulin ·

    I measure how fast 42 LLMs actually answer. Here's the honest method.

    <p>I test software for a living. So when a vendor calls an AI model "fast," I don't trust the word. I measure it.</p> <p>Most leaderboards rank how smart a model is. Almost none rank how fast it answers. You pick a model because it scored well, ship it, and then your users sit an…