mixture of experts
PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.
- instance of Mixture of Experts (MoE) 95%
- instance of Emo 95%
- instance of arXiv 90%
- used by large-language models 90%
- instance of Innu-aimun 90%
- used by SGLang 90%
- instance of DeepSeek V4-Flash 90%
- uses large-language models 80%
- instance of large-language models 70%
- instance of transformers 70%
- instance of LLM 70%
- used by LLM 70%
- 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source
19 day(s) with sentiment data
-
New parameter E predicts Mixture-of-Experts model health, preventing dead experts.
Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
-
Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals
Zyphra AI has released ZAYA1-8B, a Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained on AMD hardware, this model demonstrates competitive performance ag…
-
Tropical geometry reveals sparsity is combinatorial depth in MoE models
A new paper introduces a theoretical framework for understanding Mixture-of-Experts (MoE) models using tropical geometry. The research establishes that the routing mechanism in MoE architectures is equivalent to a speci…
-
MoLF model predicts pan-cancer gene expression from histology images
Researchers have developed MoLF, a novel generative model designed for predicting pan-cancer spatial gene expression from histology images. This model utilizes a conditional Flow Matching objective and a Mixture-of-Expe…
-
LAWS architecture offers self-certifying inference caching for LLMs and robotics
Researchers have introduced LAWS, a novel caching architecture designed to improve the efficiency of neural inference, robotics, and edge deployments. This system builds a library of certified expert functions by observ…
-
Geometry-aware model advances whole-slide image analysis in computational pathology
Researchers have developed BatMIL, a novel framework for analyzing whole-slide histopathological images. This approach utilizes a hybrid hyperbolic-Euclidean representation to better capture hierarchical tissue structur…
-
NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention
NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codeb…
-
SMoE paper proposes expert substitution for efficient edge MoE deployment
Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts an…
-
Apple researchers unveil SpecMD for faster MoE model inference
Apple's machine learning research team has published a paper detailing SpecMD, a new framework for evaluating Mixture-of-Experts (MoE) model caching policies. Their experiments show that traditional caching assumptions …
-
RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters
Researchers have developed RD-ViT, a novel Recurrent-Depth Vision Transformer designed for semantic segmentation tasks. This architecture significantly reduces data dependence by using a single, shared transformer block…
-
OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts
Researchers have developed a new event-based visual object tracking framework that addresses limitations of existing methods by explicitly modeling event density variations across multiple temporal scales. This approach…
-
RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts
Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…
-
Attention Sink research reveals inherent MoE structure in LLM attention layers
Researchers have identified that the attention sink phenomenon in Large Language Models, where the first token receives disproportionate attention, naturally forms a Mixture-of-Experts (MoE) mechanism within attention l…
-
Xiaomi unveils MiMo-V2.5-Pro AI model for automated programming tasks
Xiaomi has unveiled its MiMo-V2.5-Pro language model, designed to automate complex programming tasks. Leveraging a Mixture-of-Experts architecture and reduced token requirements, the model can handle processes that prev…
-
Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing
Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE)…
-
Researchers explore quantum neural networks via mixture of experts
Researchers have established a mean-field limit for Mixture of Experts (MoE) models trained using gradient flow in supervised learning scenarios. Their findings demonstrate that as the number of experts increases, the m…
-
GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer
Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing mod…
-
LightKV reduces LVLM KV cache size and computation by compressing vision tokens
Researchers have developed LightKV, a new method to reduce the GPU memory overhead associated with Large Vision-Language Models (LVLMs). By exploiting redundancy in vision-token embeddings and using prompt-aware guidanc…
-
FluxMoE system decouples expert weights for faster LLM serving
Researchers have developed FluxMoE, a new system designed to improve the efficiency of serving Mixture-of-Experts (MoE) models. FluxMoE addresses the challenge of large parameter sizes in MoE models by decoupling expert…
-
Study finds switchless networks more cost-effective for MoE LLM serving
A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. T…