IonRouter has launched a new inference service designed for high throughput and low cost, utilizing its proprietary IonAttention engine. This engine is capable of multiplexing multiple models on a single GPU, enabling rapid model switching and real-time traffic adaptation. The service supports various open-source models and fine-tunes, offering per-second billing and minimal cold start times, making it suitable for applications like robotics and real-time video analysis. AI
影响 Offers a potentially more cost-effective and performant inference solution for deploying various open-source and fine-tuned models.
排序理由 This is a launch of an AI inference service that integrates existing models, rather than a new foundational model release.
在 HN — AI infrastructure stories 阅读 →
- Black Forest Labs
- Cumulus
- EAGLE
- FastGen
- Flux Schnell
- GLM-5
- GPT-OSS-120B
- Grace Hopper
- IonAttention
- IonRouter
- Kimi-K2.5
- LoRA
- MiniMax-M2.5
- MoonShot AI
- NVIDIA
- Qwen2.5-7B
- Qwen3.5-122B-A10B
- Wan2.2
- ZhiPu AI
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →