SGLang
PulseAugur coverage of SGLang — every cluster mentioning SGLang across labs, papers, and developer communities, ranked by signal.
25 day(s) with sentiment data
-
Liquid AI ships tiny LFM2.5-230M for on-device agent tasks
Liquid AI has released LFM2.5-230M, its smallest model to date, designed for on-device inference on edge hardware like phones and robots. This 230-million-parameter model excels at data extraction and tool use, outperfo…
-
DeepSeek unveils V4 models with 1M token context and MoE architecture · 3 sources tracked
DeepSeek has released preview versions of its DeepSeek-V4 series, featuring two Mixture-of-Experts (MoE) language models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both models support an impressive one million token contex…
-
New sampler-verifier system boosts small LLM coding performance
A new research paper introduces a sampler and verifier system that significantly enhances the coding performance of small language models. This approach can potentially bring a 0.5 billion parameter model up to the leve…
-
Users discuss large model performance on RTX 6000 Ada PRO GPUs
A discussion on Reddit explores the performance of large language models like GLM 5.2, Kimi 2.7, and DeepSeek V4 Pro on high-end GPU setups featuring 4x or 8x NVIDIA RTX 6000 Ada Generation PRO cards. Users are sharing …
-
Unsloth releases Qwen-AgentWorld-35B model with broad integration support
The unsloth/Qwen-AgentWorld-35B-A3B-GGUF model is now available on Hugging Face, offering users instructions for integration with various libraries and inference providers. The model can be utilized with tools such as T…
-
LiquidAI releases compact LFM2.5-230M for on-device AI tasks
LiquidAI has released LFM2.5-230M, a compact language model designed for on-device deployment. This model boasts 230 million parameters and is optimized for efficient inference on various hardware, including CPUs and ed…
-
NVIDIA NeMo AutoModel accelerates AI model fine-tuning
NVIDIA has released NeMo AutoModel, an open library integrated with its NeMo framework, designed to significantly accelerate the fine-tuning of large Mixture-of-Experts (MoE) AI models. This new tool builds upon Hugging…
-
VoltanaLLM system cuts LLM inference energy use by 36% while meeting SLOs
A new system called VoltanaLLM has been developed to address the significant energy consumption of Large Language Model (LLM) inference. This system, detailed in a recent arXiv paper, employs adaptive frequency control …
-
Fireworks AI offers frontier RL infrastructure as a managed service
Fireworks AI is launching a new managed service that provides specialized infrastructure for reinforcement learning on frontier models. This service addresses the complex challenge of ensuring numerical consistency betw…
-
NVIDIA releases quantized GLM-5.2 MoE model with 1M context
NVIDIA has released the GLM-5.2 NVFP4 model, a quantized version of ZAI's GLM-5.2. This Mixture-of-Experts model is optimized for reasoning and coding tasks, featuring sparse attention and a 1 million token context leng…
-
Kamera method enhances multimodal AI efficiency with position-invariant KV cache
Researchers have developed a new method called Kamera that addresses the inefficiency of multimodal AI agents re-encoding information from repeated video frames or UI screenshots. This technique introduces a training-fr…
-
DeepSeek-v4-Fable, a security-focused AI model, released on Hugging Face
The Chunjiang-Intelligence/DeepSeek-v4-Fable model, a distilled version of Claude-5-Fable, is now available on Hugging Face. This model is specifically engineered for offensive security research, focusing on tasks like …
-
Alibaba Qwen releases open-source language world model for AI agents · 4 sources tracked
Alibaba's Qwen team has released Qwen-AgentWorld-35B-A3B, an open-source language world model designed for simulating agentic environments. This model, featuring a Mixture-of-Experts architecture with 35 billion total p…
-
MoonMath AI open-sources HIP attention kernel for AMD MI300X, beating AITER v3
MoonMath AI has open-sourced a new bf16 forward attention kernel for AMD's MI300X GPU, written in HIP. This kernel reportedly outperforms AMD's own AITER v3 across various configurations, achieving up to a 1.26x speedup…
-
New speculative decoding methods boost LLM inference speed and safety
Researchers are developing advanced speculative decoding techniques to accelerate large language model inference. HyperDFlash optimizes decoding for DeepSeek-V4's multi-hyper-connection architecture, improving draft acc…
-
DeepReinforce AI releases Ornith-1.0 family of open-source coding models
DeepReinforce AI has released the Ornith-1.0 family of open-source models, designed for agentic coding tasks. The models, available in various sizes including 9B, 35B, and 397B parameters, are built upon Gemma 4 and Qwe…
-
Hugging Face releases multimodal model huihui-ai/Huihui-gemma-4-12B-coder-fable5-composer2.5-v1-abliterated
The model huihui-ai/Huihui-gemma-4-12B-coder-fable5-composer2.5-v1-abliterated has been released on Hugging Face, offering multimodal capabilities. The model can be integrated with various libraries and inference provid…
-
Jackrong/Qwopus3.6-27B-Coder-Compat-MTP-GGUF model released on Hugging Face with integration guides
The Jackrong/Qwopus3.6-27B-Coder-Compat-MTP-GGUF model is now available on Hugging Face, offering users instructions for integration with various libraries and inference providers. The model can be utilized with tools s…
-
SGLang and MUSA merge backends, boosting China's open-source AI GPU support
SGLang and the MUSA community have merged MUSA's backend into SGLang, enabling native GPU support for China's open-source AI ecosystem. This collaboration was celebrated at their first offline meetup, signaling a new ph…
-
Datalab-to/lift model enables structured JSON extraction from images and PDFs
The datalab-to/lift model, available on Hugging Face, is designed for structured data extraction from PDFs and images. It can generate JSON output that adheres to a specified schema, utilizing schema-constrained decodin…