SGLang has released version 0.3.1 of its model gateway, significantly boosting performance and reducing memory usage. The update introduces cache-aware routing that is 10-12x faster and uses 99% less memory, enabling 100x more cache entries within the same footprint. This release also incorporates enterprise-grade security features like JWT/OIDC authentication and adds support for classification workloads. AI
IMPACT Enhances efficiency and scalability for large-scale multi-tenant AI deployments.
RANK_REASON This is a software release for an infrastructure tool, not a frontier model release or significant industry event.
Read on vLLM SGLang — Releases →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →