MLX, LiteRT-LM, and CoreML benchmarked for iPhone LLM performance

By PulseAugur Editorial · [1 sources] · 2026-06-02 05:56

A recent benchmark tested four on-device LLM runtimes on an iPhone 17 Pro, comparing decode speed and memory usage. MLX emerged as the fastest for general-purpose models like Qwen 3.5 2B, while LiteRT-LM excelled specifically with Gemma 4 E2B. For memory-constrained scenarios, CoreML with the Apple Neural Engine offered significant advantages, using substantially less RAM. AI

IMPACT Provides crucial performance data for developers choosing on-device LLM runtimes for iPhones, impacting app efficiency and user experience.

RANK_REASON Benchmarking study comparing multiple software runtimes for a specific hardware platform. [lever_c_demoted from research: ic=1 ai=0.7]

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

MLX, LiteRT-LM, and CoreML benchmarked for iPhone LLM performance

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · Daisuke Majima · 2026-06-02 05:56

On-device LLM on iPhone: which runtime is fastest? MLX vs llama.cpp vs LiteRT-LM vs CoreML

<p><strong>I want to run an LLM on iPhone.</strong><br /> But <strong>there are several runtimes and it's not obvious which to pick.</strong></p> <p>And I couldn't find many head-to-head benchmarks.</p> <div class="table-wrapper-paragraph"><table> <thead> <tr> <th>Runtime</th> <t…

COVERAGE [1]

On-device LLM on iPhone: which runtime is fastest? MLX vs llama.cpp vs LiteRT-LM vs CoreML

RELATED ENTITIES

RELATED TOPICS