A new version of the GLM model, 5.2, has been released and offers significant speed improvements on Mac Studio hardware. This update allows for prefill speeds exceeding 100 tokens per second even with large context windows, and it also reduces memory usage. These enhancements enable users with 512GB Mac devices to run 4-bit quantized models with contexts larger than 100,000 tokens. AI
IMPACT Enhances performance for local LLM deployment on specific Apple hardware, enabling larger context windows for 4-bit quantized models.
RANK_REASON This is an update to a specific model version that improves performance on particular hardware, rather than a new frontier model release or significant industry-wide event.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →