IonRouter has launched a new inference stack called IonAttention, designed to multiplex models on a single GPU for high throughput and low cost, compatible with NVIDIA Grace Hopper. Separately, RunAnywhere has released RCLI, an on-device voice AI for macOS that runs inference locally on Apple Silicon using their proprietary MetalRT engine, offering features like local RAG and VLM capabilities. AI
影响 These launches offer new options for optimizing AI inference costs and performance, both in cloud and on-device environments.
排序理由 The cluster describes new products and infrastructure for AI inference, but not a novel model release or significant industry-wide shift.
在 HN — AI infrastructure stories 阅读 →
- Apple Silicon
- GPT-OSS
- Grace Hopper
- IonAttention
- IonRouter
- llama.cpp
- MetalRT
- MiniMax
- MoonShot AI
- NVIDIA
- RunAnywhere
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →