PulseAugur / Brief
EN
LIVE 21:40:30

Brief

last 24h
[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. I made Ideogram4 NF4 run on 16GB: 512x512 in ~11 min at 11.51 GB peak on a 16GB M2 Pro. You can run it yourself. Demo live on the little box right now.

    A developer has successfully optimized the Ideogram4 image generation model to run on a 16GB Apple Silicon Mac, specifically an M2 Pro. This optimization, achieved through native MLX kernels and NF4 quantization, allows for 512x512 image generation in approximately 11 minutes with a peak memory usage of 11.51 GB. The developer has also released the code and a live demo, noting that this NF4 version is faster than mFLUX FP8 and GGUF Q4 on comparable hardware. AI

    I made Ideogram4 NF4 run on 16GB: 512x512 in ~11 min at 11.51 GB peak on a 16GB M2 Pro. You can run it yourself. Demo live on the little box right now.

    IMPACT Enables running advanced image generation models on consumer-grade hardware with limited RAM.

  2. ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants by yomaytk · Pull Request #24225 · ggml-org/llama.cpp

    A pull request for the llama.cpp project introduces optimizations for k-quantized models, significantly improving prefill speeds. The changes focus on the matrix multiplication (matmul) operations for various quantization levels, including Q4, Q5, and Q8. Benchmarks on an M2 Pro chip show speedups of up to 3.78x for certain quantizations, enhancing the performance of local large language models. AI

    ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants by yomaytk · Pull Request #24225 · ggml-org/llama.cpp

    IMPACT Improves performance for running local LLMs, potentially enabling more complex models on consumer hardware.