Brief · PulseAugur

TOOL · dev.to — LLM tag English(EN) · 8h

I measure how fast 42 LLMs actually answer. Here's the honest method.

An independent tracker named ollamatps.com has benchmarked 42 large language models (LLMs) to measure their actual response speed, distinguishing between Time to First Token (TTFT) and Tokens Per Second (TPS). The benchmark, developed by Anton, a former Apple engineer, uses a fixed prompt and output cap, with continuous re-testing to ensure reliability. Results indicate that model size does not correlate with speed, with smaller models often outperforming larger ones, and significant variations in TTFT, ranging up to 80 times slower for some models. AI

IMPACT Highlights the critical importance of speed metrics beyond raw intelligence for LLM deployment and user experience.

Apple Inc.
Ollama Cloud
HTTP
Anton
ollamatps.com