Mixture-of-Experts (MoE) models
PulseAugur coverage of Mixture-of-Experts (MoE) models — every cluster mentioning Mixture-of-Experts (MoE) models across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
Fireworks AI 实现万亿参数 MoE 模型训练
Fireworks AI 开发了新的训练基础设施,能够微调万亿参数的混合专家(MoE)模型,克服了之前的内存和编排瓶颈。该平台在最近发布的 Cursor Composer 2.5 中发挥了关键作用,Composer 2.5 是一个在多个基准测试中取得顶尖性能的编码模型。该系统利用低精度专家量化和优化器状态卸载等技术来管理大型 MoE 模型内存需求,使其更容易进行训练和微调。
-
New benchmark DBES evaluates expert specialization in MoE models
Researchers have introduced DBES, a new benchmark and metric suite designed to systematically evaluate expert specialization within Mixture-of-Experts (MoE) models. This framework moves beyond traditional evaluations by…
-
AI production systems tackle MoE challenges with new optimization techniques
SemiAnalysis is highlighting production system challenges for large-scale AI models, particularly Mixture-of-Experts (MoE) architectures. They note that techniques like expert balancing and assigning dedicated resources…
-
Anyscale 为 vLLM 中的 MoE 模型增加了 Ray Serve 容错功能
Anyscale 为其 vLLM 服务引擎引入了新的容错功能,该引擎与 Ray Serve 集成。此增强功能专门解决了部署大型专家混合(MoE)模型的挑战,这些模型被分片到多个 GPU 上。当一个数据并行(DP)组中的单个 GPU 发生故障时,新系统现在可以识别并重新启动构成该 DP 组的整个 GPU 组,从而防止整个部署变得不可用。