Mininglamp AI adds W8A8 quantization to MLX for faster Apple Silicon inference

By PulseAugur Editorial · [1 sources] · 2026-05-25 08:16

Mininglamp AI has developed Cider, a new SDK that enhances the MLX framework by adding W8A8 activation quantization. This optimization significantly speeds up the prefill process for large vision-language models on Apple Silicon, reducing prefill time from 2.84s to 2.52s on an M5 Pro chip. The SDK utilizes custom Metal kernels and offers performance improvements for models running through MLX, though INT8 TensorOps are limited to M5 processors and above. AI

IMPACT Improves inference speed for AI models on Apple Silicon, potentially accelerating local AI development and deployment.

RANK_REASON This is a software tool release that enhances an existing framework, not a core model release or significant industry-wide event.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Mininglamp AI adds W8A8 quantization to MLX for faster Apple Silicon inference

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/Enough-Astronaut9278 · 2026-05-25 08:16

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tn2p61/we_added_w8a8_activation_quantization_to_mlx/"> <img alt="We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro" src="https://preview.redd.it/uzenqmhoq83h1.png?width…

COVERAGE [1]

We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

RELATED ENTITIES

RELATED TOPICS