PulseAugur
EN
LIVE 13:28:48
ENTITY SGLang

SGLang

PulseAugur coverage of SGLang — every cluster mentioning SGLang across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
58
58 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
16
16 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-01-09 product_launch SGLang released version 0.3.1 of its model gateway, featuring performance and memory improvements. source
SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 3/3 · 58 TOTAL
  1. TOOL · CL_19382 ·

    SGLang's MI355x boosts DeepSeekv4 Pro throughput over 10x per GPU

    DeepSeekv4 Pro has seen a significant performance increase, achieving over tenfold improvement in throughput per GPU. This advancement was realized through the integration of MI355x on the SGLang framework. The gains re…

  2. TOOL · CL_16238 ·

    Aurora system unifies RL training and serving for faster LLM inference

    Researchers have developed Aurora, a novel system that unifies the training and serving of speculative decoding for large language models. This approach addresses the delays and performance degradation associated with t…

  3. RESEARCH · CL_74484 ·

    Gemma 4 QAT models spark debate over performance and utility

    Users are discussing the performance and utility of Gemma 4 QAT (Quantization Aware Training) models, particularly comparing them to standard quantizations. While some users report improved speed and quality for general…

  4. RESEARCH · CL_11567 ·

    Moore Threads completes full-link engineering adaptation for DeepSeek-V4

    Moore Threads has successfully adapted the DeepSeek-V4 large language model to run on its flagship AI training and inference accelerator card, the MTT S5000. This integration was achieved using the company's proprietary…

  5. RESEARCH · CL_14133 ·

    EVICT method speeds up MoE speculative decoding by optimizing verification

    Researchers have developed EVICT, a new method to improve the efficiency of speculative decoding for Mixture-of-Experts (MoE) models. This technique adaptively truncates the draft tree during verification, focusing on c…

  6. RESEARCH · CL_10143 ·

    Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

    Researchers have developed UniPrefill, a novel framework designed to accelerate the prefill stage of long-context language models. Unlike previous methods that primarily benefit full-attention models, UniPrefill works a…

  7. RESEARCH · CL_09151 ·

    SGLang AI inference server hit with critical CVE-2026-5760 vulnerability

    A critical security vulnerability (CVE-2026-5760) with a severity score of 9.8 has been identified in SGLang, an AI inference server. The issue arises from a poisoned GGUF model file containing a chat-template that SGLa…

  8. RESEARCH · CL_09107 ·

    Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

    A new paper introduces a stateful transformer inference engine that significantly speeds up processing for streaming data by maintaining a persistent KV cache. This approach allows for query latency that is independent …

  9. RESEARCH · CL_05379 ·

    AI models see tool-calling improvements and bug fixes

    A new tool has been developed that addresses a need identified by Andrej Karpathy, with its creation reportedly taking only 48 hours. Separately, a bug affecting DeepSeek V4's output in the SGLang open-source inference …

  10. RESEARCH · CL_14463 ·

    New research explores LLM security, efficiency, and training optimization

    Researchers are developing novel methods to enhance the efficiency and security of Large Language Models (LLMs). One approach, "Widening the Gap," exploits outlier injection to compromise LLM quantization, demonstrating…

  11. SIGNIFICANT · CL_48047 ·

    Fireworks AI releases DeepSeek V4 Pro after fixing critical bugs

    Fireworks AI has released DeepSeek V4 Pro, an open-source model notable for its advancements in long-context reasoning, agentic performance, and inference efficiency. The model features a mixture-of-experts architecture…

  12. RESEARCH · CL_03565 ·

    GLM 5.1 achieves 40 tokens/sec locally on RTX 6000 Pro cards

    A user on the r/LocalLLaMA subreddit has successfully optimized the GLM 5.1 model for local deployment, achieving impressive performance metrics. By applying specific patches to the sglang inference software and utilizi…

  13. SIGNIFICANT · CL_48566 ·

    Moonshot AI releases Kimi K2.6 multimodal agentic model

    Moonshot AI has released Kimi K2.6, an open-source multimodal model designed for advanced agentic tasks. This model demonstrates significant improvements in long-horizon coding across multiple languages and domains. Kim…

  14. FRONTIER RELEASE · CL_47594 ·

    Qwen releases 27B multimodal model for advanced coding

    Qwen has released Qwen3.6-27B, a dense 27-billion-parameter multimodal model designed for advanced coding tasks. This model aims to provide flagship-level agentic coding performance, surpassing previous open-source mode…

  15. TOOL · CL_48049 ·

    SGLang boosts model gateway performance with cache-aware routing

    SGLang has released version 0.3.1 of its model gateway, significantly boosting performance and reducing memory usage. The update introduces cache-aware routing that is 10-12x faster and uses 99% less memory, enabling 10…

  16. FRONTIER RELEASE · CL_40513 ·

    NVIDIA Nemotron Diffusion models offer 6.4x faster AI inference

    NVIDIA has released the Nemotron-Labs Diffusion family of language models, available in 3B, 8B, and 14B parameter sizes. These models uniquely support autoregressive (AR), diffusion, and self-speculation decoding modes …

  17. FRONTIER RELEASE · CL_01752 ·

    MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model

    MiniMax has released MiniMax 2.7, an open-source model that matches the performance of Z.ai's GLM-5 on several benchmarks but at a significantly lower cost. The model is noted for its efficiency and claims to be the fir…

  18. FRONTIER RELEASE · CL_00821 ·

    DeepSeek v3 leads open-weight models, Baseten enables mission-critical inference

    DeepSeek v3, a new 671B parameter Mixture-of-Experts model, has been released and is currently the top-performing open-weights model available. Serving such large models presents significant challenges, but inference st…