实体 SGLang

SGLang

PulseAugur coverage of SGLang — every cluster mentioning SGLang across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

90 天内 31

发布 · 30天

90 天内 0

论文 · 30天

90 天内 12

层级分布 · 90 天

frontier release 3
significant 3
research 7
tool 17
commentary 1

关系

used by vLLM 70%
used by graphics processing unit 70%
competes with vLLM 50%
used by transformers 50%

时间线

2026-01-09 product_launch SGLang released version 0.3.1 of its model gateway, featuring performance and memory improvements. 来源

情绪 · 30 天

10 天有情绪数据

最近 · 第 2/2 页 · 共 31 条

RESEARCH · CL_09107 · Apr 29 · 13:19

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

A new paper introduces a stateful transformer inference engine that significantly speeds up processing for streaming data by maintaining a persistent KV cache. This approach allows for query latency that is independent …
RESEARCH · CL_05379 · Apr 27 · 10:01

AI模型在工具调用方面得到改进并修复了错误

一款新工具已被开发出来，满足了Andrej Karpathy提出的需求，据报道其开发仅用了48小时。另外，SGLang开源推理引擎中影响DeepSeek V4输出的一个错误已得到解决。此外，NousResearch的Ornstein-Hermes-3.6-27B模型的工具调用能力也得到了改进。
RESEARCH · CL_14463 · Apr 27 · 04:00

New research explores LLM security, efficiency, and training optimization

Researchers are developing novel methods to enhance the efficiency and security of Large Language Models (LLMs). One approach, "Widening the Gap," exploits outlier injection to compromise LLM quantization, demonstrating…
SIGNIFICANT · CL_48047 · Apr 27 · 00:00

Fireworks AI 在修复关键错误后发布 DeepSeek V4 Pro

Fireworks AI 发布了 DeepSeek V4 Pro，这是一个开源模型，在长上下文推理、代理性能和推理效率方面取得了显著进步。该模型采用混合专家架构和 1M token 上下文窗口，旨在以经济高效的方式处理广泛的状态和复杂的代理工作流。Fireworks AI 推迟了公开发布，以解决导致推理退化和输出损坏的关键服务路径正确性问题，确保在发布前已做好生产准备。
RESEARCH · CL_03565 · Apr 25 · 16:31

GLM 5.1 achieves 40 tokens/sec locally on RTX 6000 Pro cards

A user on the r/LocalLLaMA subreddit has successfully optimized the GLM 5.1 model for local deployment, achieving impressive performance metrics. By applying specific patches to the sglang inference software and utilizi…
SIGNIFICANT · CL_48566 · Apr 14 · 04:23

Moonshot AI 发布 Kimi K2.6 多模态代理模型

Moonshot AI 发布了 Kimi K2.6，一个开源的多模态模型，专为高级代理任务设计。该模型在多种语言和领域的长时程编码方面表现出显著的改进。Kimi K2.6 还擅长根据提示和视觉输入生成生产就绪的界面和全栈工作流，并注重美学精度。
FRONTIER RELEASE · CL_47594 · Apr 13 · 09:12

Qwen 发布 27B 多模态模型，用于高级编码

Qwen 发布了 Qwen3.6-27B，这是一个拥有 270 亿参数的密集多模态模型，专为高级编码任务设计。该模型旨在提供旗舰级的智能体编码性能，超越了此前该类别中的开源模型。社区成员已经发布了 Qwen3.6-27B 的不同量化版本，可在 Hugging Face 上获取，方便其在不同平台和库中使用。
TOOL · CL_48049 · Jan 9 · 06:18

SGLang 通过感知缓存的路由提升模型网关性能

SGLang 发布了其模型网关 0.3.1 版本，显著提升了性能并减少了内存使用。此次更新引入了感知缓存的路由，速度提升 10-12 倍，内存使用减少 99%，在相同占用空间内可实现 100 倍的缓存条目。此版本还集成了企业级安全功能，如 JWT/OIDC 认证，并增加了对分类工作负载的支持。
FRONTIER RELEASE · CL_40513 · Dec 15 · 00:00

NVIDIA Nemotron Diffusion模型提供6.4倍更快的AI推理速度

NVIDIA发布了Nemotron-Labs Diffusion系列语言模型，提供3B、8B和14B参数规模。这些模型在一个架构内独特地支持自回归（AR）、扩散和自推测解码模式，实现了显著的速度提升。通过并行生成token块而非顺序生成，Nemotron-Labs Diffusion的吞吐量比传统AR模型高出6.4倍，同时保持或提高了准确性。这一突破解决了AR模型固有的内存带宽瓶颈，使其在生产部署和代理系统中更高效。
FRONTIER RELEASE · CL_01752 · Jul 28 · 05:44

MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model

MiniMax has released MiniMax 2.7, an open-source model that matches the performance of Z.ai's GLM-5 on several benchmarks but at a significantly lower cost. The model is noted for its efficiency and claims to be the fir…
FRONTIER RELEASE · CL_00821 · Jan 19 · 04:00

DeepSeek v3 leads open-weight models, Baseten enables mission-critical inference

DeepSeek v3, a new 671B parameter Mixture-of-Experts model, has been released and is currently the top-performing open-weights model available. Serving such large models presents significant challenges, but inference st…

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

AI模型在工具调用方面得到改进并修复了错误

New research explores LLM security, efficiency, and training optimization

Fireworks AI 在修复关键错误后发布 DeepSeek V4 Pro

GLM 5.1 achieves 40 tokens/sec locally on RTX 6000 Pro cards

Moonshot AI 发布 Kimi K2.6 多模态代理模型

Qwen 发布 27B 多模态模型，用于高级编码

SGLang 通过感知缓存的路由提升模型网关性能

NVIDIA Nemotron Diffusion模型提供6.4倍更快的AI推理速度

MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model

DeepSeek v3 leads open-weight models, Baseten enables mission-critical inference