ENTITY SGLang

SGLang

PulseAugur coverage of SGLang — every cluster mentioning SGLang across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

58 over 90d

Releases · 30d

0 over 90d

Papers · 30d

16 over 90d

TIER MIX · 90D

frontier release 9
significant 4
research 12
tool 30
commentary 2
meme 1

TOPICS

product 33
infra 32
model release 30
paper 16
other 3
safety 2
funding 1

RELATIONSHIPS

used by vLLM 70%
used by transformers 70%
used by graphics processing unit 70%
used by Ollama 70%
used by llama-cpp-python 60%
affiliated with vLLM 50%
affiliated with transformers 50%
competes with vLLM 50%
used by llama.cpp 50%
used by Raspberry Pi 50%

TIMELINE

2026-01-09 product_launch SGLang released version 0.3.1 of its model gateway, featuring performance and memory improvements. source

SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 3/3 · 58 TOTAL

TOOL · CL_19382 · May 6 · 13:00

SGLang's MI355x boosts DeepSeekv4 Pro throughput over 10x per GPU

DeepSeekv4 Pro has seen a significant performance increase, achieving over tenfold improvement in throughput per GPU. This advancement was realized through the integration of MI355x on the SGLang framework. The gains re…
TOOL · CL_16238 · May 5 · 04:00

Aurora system unifies RL training and serving for faster LLM inference

Researchers have developed Aurora, a novel system that unifies the training and serving of speculative decoding for large language models. This approach addresses the delays and performance degradation associated with t…
RESEARCH · CL_74484 · May 1 · 04:26

Gemma 4 QAT models spark debate over performance and utility

Users are discussing the performance and utility of Gemma 4 QAT (Quantization Aware Training) models, particularly comparing them to standard quantizations. While some users report improved speed and quality for general…
RESEARCH · CL_11567 · May 1 · 03:46

Moore Threads completes full-link engineering adaptation for DeepSeek-V4

Moore Threads has successfully adapted the DeepSeek-V4 large language model to run on its flagship AI training and inference accelerator card, the MTT S5000. This integration was achieved using the company's proprietary…
RESEARCH · CL_14133 · May 1 · 01:52

EVICT method speeds up MoE speculative decoding by optimizing verification

Researchers have developed EVICT, a new method to improve the efficiency of speculative decoding for Mixture-of-Experts (MoE) models. This technique adaptively truncates the draft tree during verification, focusing on c…
RESEARCH · CL_10143 · Apr 30 · 04:00

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Researchers have developed UniPrefill, a novel framework designed to accelerate the prefill stage of long-context language models. Unlike previous methods that primarily benefit full-attention models, UniPrefill works a…
RESEARCH · CL_09151 · Apr 29 · 14:10

SGLang AI inference server hit with critical CVE-2026-5760 vulnerability

A critical security vulnerability (CVE-2026-5760) with a severity score of 9.8 has been identified in SGLang, an AI inference server. The issue arises from a poisoned GGUF model file containing a chat-template that SGLa…
RESEARCH · CL_09107 · Apr 29 · 13:19

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

A new paper introduces a stateful transformer inference engine that significantly speeds up processing for streaming data by maintaining a persistent KV cache. This approach allows for query latency that is independent …
RESEARCH · CL_05379 · Apr 27 · 10:01

AI models see tool-calling improvements and bug fixes

A new tool has been developed that addresses a need identified by Andrej Karpathy, with its creation reportedly taking only 48 hours. Separately, a bug affecting DeepSeek V4's output in the SGLang open-source inference …
RESEARCH · CL_14463 · Apr 27 · 04:00

New research explores LLM security, efficiency, and training optimization

Researchers are developing novel methods to enhance the efficiency and security of Large Language Models (LLMs). One approach, "Widening the Gap," exploits outlier injection to compromise LLM quantization, demonstrating…
SIGNIFICANT · CL_48047 · Apr 27 · 00:00

Fireworks AI releases DeepSeek V4 Pro after fixing critical bugs

Fireworks AI has released DeepSeek V4 Pro, an open-source model notable for its advancements in long-context reasoning, agentic performance, and inference efficiency. The model features a mixture-of-experts architecture…
RESEARCH · CL_03565 · Apr 25 · 16:31

GLM 5.1 achieves 40 tokens/sec locally on RTX 6000 Pro cards

A user on the r/LocalLLaMA subreddit has successfully optimized the GLM 5.1 model for local deployment, achieving impressive performance metrics. By applying specific patches to the sglang inference software and utilizi…
SIGNIFICANT · CL_48566 · Apr 14 · 04:23

Moonshot AI releases Kimi K2.6 multimodal agentic model

Moonshot AI has released Kimi K2.6, an open-source multimodal model designed for advanced agentic tasks. This model demonstrates significant improvements in long-horizon coding across multiple languages and domains. Kim…
FRONTIER RELEASE · CL_47594 · Apr 13 · 09:12

Qwen releases 27B multimodal model for advanced coding

Qwen has released Qwen3.6-27B, a dense 27-billion-parameter multimodal model designed for advanced coding tasks. This model aims to provide flagship-level agentic coding performance, surpassing previous open-source mode…
TOOL · CL_48049 · Jan 9 · 06:18

SGLang boosts model gateway performance with cache-aware routing

SGLang has released version 0.3.1 of its model gateway, significantly boosting performance and reducing memory usage. The update introduces cache-aware routing that is 10-12x faster and uses 99% less memory, enabling 10…
FRONTIER RELEASE · CL_40513 · Dec 15 · 00:00

NVIDIA Nemotron Diffusion models offer 6.4x faster AI inference

NVIDIA has released the Nemotron-Labs Diffusion family of language models, available in 3B, 8B, and 14B parameter sizes. These models uniquely support autoregressive (AR), diffusion, and self-speculation decoding modes …
FRONTIER RELEASE · CL_01752 · Jul 28 · 05:44

MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model

MiniMax has released MiniMax 2.7, an open-source model that matches the performance of Z.ai's GLM-5 on several benchmarks but at a significantly lower cost. The model is noted for its efficiency and claims to be the fir…
FRONTIER RELEASE · CL_00821 · Jan 19 · 04:00

DeepSeek v3 leads open-weight models, Baseten enables mission-critical inference

DeepSeek v3, a new 671B parameter Mixture-of-Experts model, has been released and is currently the top-performing open-weights model available. Serving such large models presents significant challenges, but inference st…

SGLang's MI355x boosts DeepSeekv4 Pro throughput over 10x per GPU

Aurora system unifies RL training and serving for faster LLM inference

Gemma 4 QAT models spark debate over performance and utility

Moore Threads completes full-link engineering adaptation for DeepSeek-V4

EVICT method speeds up MoE speculative decoding by optimizing verification

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

SGLang AI inference server hit with critical CVE-2026-5760 vulnerability

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

AI models see tool-calling improvements and bug fixes

New research explores LLM security, efficiency, and training optimization

Fireworks AI releases DeepSeek V4 Pro after fixing critical bugs

GLM 5.1 achieves 40 tokens/sec locally on RTX 6000 Pro cards

Moonshot AI releases Kimi K2.6 multimodal agentic model

Qwen releases 27B multimodal model for advanced coding

SGLang boosts model gateway performance with cache-aware routing

NVIDIA Nemotron Diffusion models offer 6.4x faster AI inference

MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model

DeepSeek v3 leads open-weight models, Baseten enables mission-critical inference