mixture of experts
PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.
- instance of arXiv 90%
- instance of Innu-aimun 90%
- instance of DeepSeek V4-Flash 90%
- uses large-language models 80%
- used by large-language models 80%
- instance of transformers 70%
- uses LLM 70%
- used by transformer 70%
- instance of transformer 70%
- used by Emo 70%
- developed by Emo 70%
- competes with transformers 50%
- 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. 来源
12 天有情绪数据
-
NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention
NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codeb…
-
SMoE paper proposes expert substitution for efficient edge MoE deployment
Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts an…
-
Apple researchers unveil SpecMD for faster MoE model inference
Apple's machine learning research team has published a paper detailing SpecMD, a new framework for evaluating Mixture-of-Experts (MoE) model caching policies. Their experiments show that traditional caching assumptions …
-
RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters
Researchers have developed RD-ViT, a novel Recurrent-Depth Vision Transformer designed for semantic segmentation tasks. This architecture significantly reduces data dependence by using a single, shared transformer block…
-
OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts
Researchers have developed a new event-based visual object tracking framework that addresses limitations of existing methods by explicitly modeling event density variations across multiple temporal scales. This approach…
-
RAST-MoE-RL框架通过专业AI专家提升网约车效率
研究人员开发了一个名为RAST-MoE-RL的新框架,以提高网约车服务的效率。该框架在深度强化学习中采用了专家混合(MoE)方法,以更好地处理网约车平台典型的复杂动态供需状况。通过允许专业专家适应不同的运营场景,该系统旨在减少匹配和接载延迟,其表现优于现有方法,且参数数量显著减少。
-
Attention Sink research reveals inherent MoE structure in LLM attention layers
Researchers have identified that the attention sink phenomenon in Large Language Models, where the first token receives disproportionate attention, naturally forms a Mixture-of-Experts (MoE) mechanism within attention l…
-
小米发布 MiMo-V2.5-Pro AI 模型,用于自动化编程任务
小米发布了其 MiMo-V2.5-Pro 语言模型,旨在自动化复杂的编程任务。该模型利用混合专家架构和减少的 token 需求,可以处理以前需要数小时才能完成的流程。此举使小米成为生成式 AI 领域现有市场领导者的竞争对手。
-
Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing
Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE)…
-
Researchers explore quantum neural networks via mixture of experts
Researchers have established a mean-field limit for Mixture of Experts (MoE) models trained using gradient flow in supervised learning scenarios. Their findings demonstrate that as the number of experts increases, the m…
-
GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer
Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing mod…
-
LightKV reduces LVLM KV cache size and computation by compressing vision tokens
Researchers have developed LightKV, a new method to reduce the GPU memory overhead associated with Large Vision-Language Models (LVLMs). By exploiting redundancy in vision-token embeddings and using prompt-aware guidanc…
-
FluxMoE system decouples expert weights for faster LLM serving
Researchers have developed FluxMoE, a new system designed to improve the efficiency of serving Mixture-of-Experts (MoE) models. FluxMoE addresses the challenge of large parameter sizes in MoE models by decoupling expert…
-
Study finds switchless networks more cost-effective for MoE LLM serving
A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. T…
-
Mixture of Experts framework speeds up atomistic simulations
Researchers have developed a new Mixture-of-Experts (MoE) framework for Machine Learning Interatomic Potentials (MLIPs) to accelerate atomistic simulations. This approach divides simulation domains into regions of varyi…
-
Liquid AI releases LFM2-24B-A2B, an efficient 24B parameter MoE model
Liquid AI has released an early checkpoint of its LFM2-24B-A2B model, a sparse Mixture of Experts (MoE) architecture with 24 billion total parameters and 2 billion active parameters per token. This model demonstrates th…
-
FaaSMoE offers resource-efficient, serverless serving for multi-tenant Mixture-of-Experts models.
Researchers have developed FaaSMoE, a novel serverless framework designed for serving Mixture-of-Experts (MoE) models in multi-tenant environments. This architecture deploys individual experts as stateless functions on …
-
New framework uses physics-informed transfer learning for multi-site emission control
Researchers have developed a new physics-informed transfer learning framework designed to improve emission control in municipal solid waste incineration. This framework utilizes a mixture-of-experts model to manage carb…
-
Mixture-of-Experts model applied to GlueX DIRC detector for physics analysis
Researchers have developed a Mixture-of-Experts (MoE) foundation model to streamline data analysis for the GlueX DIRC detector at Jefferson Lab. This unified framework handles fast simulation, particle identification, a…
-
拥有 1.6T 参数的 DeepSeek-V4 Pro 模型现已上线 Together AI
DeepSeek-V4 Pro 是一个拥有 1.6 万亿参数的大型混合专家模型,现已在 Together AI 平台上可用。该模型专为长上下文推理而设计,在其最初的 Together AI 部署中支持高达 512K 令牌的上下文窗口,并计划支持 1M 令牌的上下文窗口。它具有可控的推理模式,可针对速度或深度进行优化,并为缓存的输入令牌提供专门的定价,以降低重复查询的成本。