Apple's machine learning research team has published a paper detailing SpecMD, a new framework for evaluating Mixture-of-Experts (MoE) model caching policies. Their experiments show that traditional caching assumptions like Least Recently Used (LRU) are ineffective for MoE models due to inconsistent expert access patterns. To address this, they propose a novel eviction policy called Least-Stale, which leverages predictable expert access to significantly reduce cache misses and improve inference speed. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a novel caching policy that could significantly reduce inference costs and latency for Mixture-of-Experts models.
RANK_REASON This is a research paper detailing a new framework and caching policy for Mixture-of-Experts models. [lever_c_demoted from research: ic=1 ai=1.0]