Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 2d

Benchmarked inference engines for M1 Max 64gb-results & analysis

A hobbyist benchmarked several inference engines on an M1 Max MacBook Pro using the Qwen3.5-4B model. The results, submitted to the mlx-chronos community benchmark, indicate that rapid-mlx offers the best performance in terms of speed and memory efficiency. The user is now employing rapid-mlx to serve the Qwen35b-A3b model. AI

IMPACT Provides practical insights for local LLM deployment on Apple Silicon, highlighting efficient inference engines.

ollama
Hermes Agent
M1 Max
Qwen3.5-4B
rapid-mlx
Qwen35b-A3b
mlx-chronos