Brief · PulseAugur

TOOL · dev.to — LLM tag English(EN) · 3h

On-device LLM on iPhone: which runtime is fastest? MLX vs llama.cpp vs LiteRT-LM vs CoreML

A recent benchmark tested four on-device LLM runtimes on an iPhone 17 Pro, comparing decode speed and memory usage. MLX emerged as the fastest for general-purpose models like Qwen 3.5 2B, while LiteRT-LM excelled specifically with Gemma 4 E2B. For memory-constrained scenarios, CoreML with the Apple Neural Engine offered significant advantages, using substantially less RAM. AI

IMPACT Provides crucial performance data for developers choosing on-device LLM runtimes for iPhones, impacting app efficiency and user experience.

llama.cpp
iPhone 17 Pro
Gemma 4 E2B
MLX
Apple Neural Engine
LiteRT-LM
Qwen 3.5 2B