IonRouter has launched a new inference stack called IonAttention, designed to multiplex models on a single GPU for high throughput and low cost, compatible with NVIDIA Grace Hopper. Separately, RunAnywhere has released RCLI, an on-device voice AI for macOS that runs inference locally on Apple Silicon using their proprietary MetalRT engine, offering features like local RAG and VLM capabilities. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT These launches offer new options for optimizing AI inference costs and performance, both in cloud and on-device environments.
RANK_REASON The cluster describes new products and infrastructure for AI inference, but not a novel model release or significant industry-wide shift.