PulseAugur / Brief
EN
LIVE 09:13:32

Brief

last 24h
[1/1] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

    Mininglamp AI has developed Cider, a new SDK that enhances the MLX framework by adding W8A8 activation quantization. This optimization significantly speeds up the prefill process for large vision-language models on Apple Silicon, reducing prefill time from 2.84s to 2.52s on an M5 Pro chip. The SDK utilizes custom Metal kernels and offers performance improvements for models running through MLX, though INT8 TensorOps are limited to M5 processors and above. AI

    We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro

    IMPACT Improves inference speed for AI models on Apple Silicon, potentially accelerating local AI development and deployment.