vLLM
PulseAugur coverage of vLLM — every cluster mentioning vLLM across labs, papers, and developer communities, ranked by signal.
- used by graphics processing unit 90%
- used by H.1000 Gnome 80%
- used by llama-cpp-python 70%
- used by Fp8 70%
- used by Horizon 2020 70%
- uses Anyscale, Inc. 70%
- competes with Text Generation Inference 60%
- used by Mlx 60%
- uses LM Studio 60%
- affiliated with Anyscale, Inc. 50%
- affiliated with LM Studio 50%
- affiliated with llama-cpp-python 50%
- 2026-05-15 product_launch vLLM released version 0.21.1rc0.
15 天有情绪数据
-
Oracle secures $300B OpenAI contract, boosting OCI revenue growth
Oracle's cloud infrastructure division announced a significant surge in revenue bookings, reaching $455 billion, largely due to a substantial contract with OpenAI. This deal positions Oracle as a key player in providing…
-
MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
MiniMax has released MiniMax 2.7, an open-source model that matches the performance of Z.ai's GLM-5 on several benchmarks but at a significantly lower cost. The model is noted for its efficiency and claims to be the fir…
-
新的模拟器和框架增强了LLM的训练、推理和微调
研究人员开发了几个新的工具和框架,以提高大型语言模型(LLM)操作的效率和准确性。Charon和Frontier是旨在高精度预测LLM训练和推理性能的模拟器,有助于优化工作。FT-Dojo为自主LLM微调提供了一个基准环境,而rePIRL提供了一个受逆强化学习启发的框架来学习过程奖励模型。此外,PALS专注于混合专家模型的功耗感知LLM服务,而LlamaWeb使用WebGPU在Web浏览器中实现内存高效的LLM推理。
-
Graft 和 FlexDraft 通过新的推测性解码方法提升 LLM 速度
两篇新研究论文 Graft 和 FlexDraft 引入了先进的推测性解码技术,以加速大型语言模型推理。Graft 结合了剪枝和检索,以填补剪枝分支留下的空白,在无需训练的情况下实现了显著的加速。FlexDraft 采用注意力调整和奖励引导校准,以灵活适应不同的批处理大小,缓解草稿验证不匹配问题并提高吞吐量。这些方法旨在通过允许以接近小型模型的速度提供高质量响应,来克服 LLM 部署中的延迟-成本陷阱。