Mlx
PulseAugur coverage of Mlx — every cluster mentioning Mlx across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
Mininglamp AI adds W8A8 quantization to MLX for faster Apple Silicon inference
Mininglamp AI has developed Cider, a new SDK that enhances the MLX framework by adding W8A8 activation quantization. This optimization significantly speeds up the prefill process for large vision-language models on Appl…
-
Open-source AI tools advance real-time video and film generation
Two open-source AI projects are making strides in multimedia generation. Fasterliveportrait-mlx is integrating MLX for real-time human synthesis and audio-video creation, focusing on Apple Silicon. Utopai's PAI aims to …
-
Google Spark vs. OpenClaw: AI debate centers on workflow control, not model smarts
A Reddit discussion reveals that the competition between Google Spark and OpenClaw is not about which AI model is smarter, but rather about control over user workflows. Google Spark leverages its ecosystem of cloud serv…
-
Unsloth beta adds 2x faster inference, API calling, and MLX support
Unsloth has released version v0.1.405-beta, introducing significant performance enhancements and new features. The update includes up to 2x faster GGUF inference through MTP speculative decoding and adds API calling sup…
-
MLX achieves CUDA backend milestone, boosting GPU acceleration
Cheng announced a significant milestone for MLX, with all tests passing on its CUDA backend. This achievement enhances MLX's GPU acceleration and CUDA compatibility. It represents positive progress for integrating Apple…
-
Ollama v0.23.4 adds vision model support for opencode
Ollama has released version 0.23.4, introducing support for vision models with image inputs when launching the opencode application. This update also includes fixes for formatting Claude tool results when local image pa…
-
Apple's MLX framework accelerates local LLMs on Macs
Apple's MLX framework is significantly boosting local LLM performance on Apple Silicon Macs, outperforming tools like llama.cpp. LM Studio, a popular LLM frontend, now leverages MLX on Apple Silicon, offering a substant…
-
Local AI models lag hosted APIs due to complex setup and lack of polish
Armin Ronacher argues that while significant progress has been made in running AI models locally, the user experience for developers, particularly with coding agents, remains frustratingly complex. He highlights the gap…
-
Ollama v0.23.1 adds Gemma 4 MTP for faster coding on Macs
Ollama has released version 0.23.1, introducing support for Gemma 4 MTP (Multi-token Processing) with speculative decoding on Macs. This enhancement can reportedly double the speed for the Gemma 4 31B model when perform…
-
Qwen 35B model outperforms 27B on coding tasks, offering 8x speed boost
A user on Reddit's r/LocalLLaMA shared a benchmark comparing two versions of the Qwen 3.6 model on a MacBook Pro with an M5 Pro chip and 64GB of RAM. The 35B A3B model, using a 4-bit quantization, significantly outperfo…
-
Hugging Face model fixes Qwen chat templates for better tool use
A Hugging Face model repository, froggeric/Qwen-Fixed-Chat-Templates, has been updated with significant improvements to its chat templates for Qwen 3.5 and 3.6 models. These updates address issues such as "empty think" …
-
Apple researchers unveil parallel RNN training and enhanced SSMs at ICLR 2026
Apple researchers are presenting new work at ICLR 2026, focusing on advancements in recurrent neural networks (RNNs) and state space models (SSMs). Their paper "ParaRNN" introduces a parallelized training framework that…
-
Alibaba's Qwen3.5-397B-A17B model offers multimodal capabilities and efficient inference
Alibaba has released Qwen3.5-397B-A17B, an open-weight, natively multimodal model featuring a hybrid attention mechanism and sparse Mixture-of-Experts architecture. The model boasts support for 201 languages and demonst…
-
Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager
Moonshot has released Kimi K2.6, an updated open-weight model that enhances its capabilities in agentic coding and multimodal understanding. This new version boasts a 1T-parameter Mixture-of-Experts architecture with 32…
-
Yannic Kilcher critiques theoretical limits of embedding-based retrieval
A YouTube video analyzes the theoretical limitations of embedding-based retrieval, with the creator expressing strong opinions on the topic. Separately, a Mastodon post discusses libraries, databases, and models essenti…
-
Gemma 3n fully available in the open-source ecosystem!
Google DeepMind has fully released Gemma 3n, a mobile-first multimodal model designed for on-device applications. This new architecture supports image, audio, video, and text inputs, with text outputs, and is optimized …