On-device LLM on iPhone: which runtime is fastest? MLX vs llama.cpp vs LiteRT-LM vs CoreML
A recent benchmark tested four on-device LLM runtimes on an iPhone 17 Pro, comparing decode speed and memory usage. MLX emerged as the fastest for general-purpose models like Qwen 3.5 2B, while LiteRT-LM excelled specifically with Gemma 4 E2B. For memory-constrained scenarios, CoreML with the Apple Neural Engine offered significant advantages, using substantially less RAM. AI
IMPACT Provides crucial performance data for developers choosing on-device LLM runtimes for iPhones, impacting app efficiency and user experience.