PulseAugur
EN
LIVE 10:13:12
tool · [1 source] ·

Mininglamp AI adds W8A8 quantization to MLX for faster Apple Silicon inference

Mininglamp AI has developed Cider, a new SDK that enhances the MLX framework by adding W8A8 activation quantization. This optimization significantly speeds up the prefill process for large vision-language models on Apple Silicon, reducing prefill time from 2.84s to 2.52s on an M5 Pro chip. The SDK utilizes custom Metal kernels and offers performance improvements for models running through MLX, though INT8 TensorOps are limited to M5 processors and above. AI

Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →

IMPACT Improves inference speed for AI models on Apple Silicon, potentially accelerating local AI development and deployment.

RANK_REASON This is a software tool release that enhances an existing framework, not a core model release or significant industry-wide event.

Read on r/LocalLLaMA →

Mininglamp AI adds W8A8 quantization to MLX for faster Apple Silicon inference

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 · /u/Enough-Astronaut9278 ·

    We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tn2p61/we_added_w8a8_activation_quantization_to_mlx/"> <img alt="We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro" src="https://preview.redd.it/uzenqmhoq83h1.png?width…