ENTITY mixture of experts

mixture of experts

PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

100

100 over 90d

Releases · 30d

0 over 90d

Papers · 30d

76 over 90d

TIER MIX · 90D

frontier release 8
significant 3
research 36
tool 49
commentary 4

TOPICS

paper 76
model release 56
infra 32
product 19
other 16
safety 7
funding 1

RELATIONSHIPS

instance of Mixture of Experts (MoE) 95%
instance of Emo 95%
instance of arXiv 90%
used by large-language models 90%
instance of Innu-aimun 90%
used by SGLang 90%
instance of DeepSeek V4-Flash 90%
uses large-language models 80%
instance of large-language models 70%
instance of transformers 70%
instance of LLM 70%
used by LLM 70%

TIMELINE

2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source

SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 4/5 · 100 TOTAL

RESEARCH · CL_21794 · May 7 · 15:23

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
TOOL · CL_20870 · May 7 · 05:44

Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals

Zyphra AI has released ZAYA1-8B, a Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained on AMD hardware, this model demonstrates competitive performance ag…
TOOL · CL_20549 · May 7 · 04:00

Tropical geometry reveals sparsity is combinatorial depth in MoE models

A new paper introduces a theoretical framework for understanding Mixture-of-Experts (MoE) models using tropical geometry. The research establishes that the routing mechanism in MoE architectures is equivalent to a speci…
TOOL · CL_20547 · May 7 · 04:00

MoLF model predicts pan-cancer gene expression from histology images

Researchers have developed MoLF, a novel generative model designed for predicting pan-cancer spatial gene expression from histology images. This model utilizes a conditional Flow Matching objective and a Mixture-of-Expe…
TOOL · CL_20383 · May 7 · 04:00

LAWS architecture offers self-certifying inference caching for LLMs and robotics

Researchers have introduced LAWS, a novel caching architecture designed to improve the efficiency of neural inference, robotics, and edge deployments. This system builds a library of certified expert functions by observ…
RESEARCH · CL_20274 · May 6 · 17:33

Geometry-aware model advances whole-slide image analysis in computational pathology

Researchers have developed BatMIL, a novel framework for analyzing whole-slide histopathological images. This approach utilizes a hybrid hyperbolic-Euclidean representation to better capture hierarchical tissue structur…
RESEARCH · CL_18472 · May 6 · 04:00

NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention

NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codeb…
TOOL · CL_18630 · May 6 · 04:00

SMoE paper proposes expert substitution for efficient edge MoE deployment

Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts an…
TOOL · CL_20119 · May 6 · 00:00

Apple researchers unveil SpecMD for faster MoE model inference

Apple's machine learning research team has published a paper detailing SpecMD, a new framework for evaluating Mixture-of-Experts (MoE) model caching policies. Their experiments show that traditional caching assumptions …
RESEARCH · CL_18667 · May 5 · 17:21

RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters

Researchers have developed RD-ViT, a novel Recurrent-Depth Vision Transformer designed for semantic segmentation tasks. This architecture significantly reduces data dependence by using a single, shared transformer block…
RESEARCH · CL_18682 · May 5 · 13:05

OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts

Researchers have developed a new event-based visual object tracking framework that addresses limitations of existing methods by explicitly modeling event density variations across multiple temporal scales. This approach…
TOOL · CL_16235 · May 5 · 04:00

RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts

Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…
TOOL · CL_15969 · May 5 · 04:00

Attention Sink research reveals inherent MoE structure in LLM attention layers

Researchers have identified that the attention sink phenomenon in Large Language Models, where the first token receives disproportionate attention, naturally forms a Mixture-of-Experts (MoE) mechanism within attention l…
RESEARCH · CL_14912 · May 4 · 19:00

Xiaomi unveils MiMo-V2.5-Pro AI model for automated programming tasks

Xiaomi has unveiled its MiMo-V2.5-Pro language model, designed to automate complex programming tasks. Leveraging a Mixture-of-Experts architecture and reduced token requirements, the model can handle processes that prev…
RESEARCH · CL_15510 · May 4 · 14:26

Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing

Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE)…
RESEARCH · CL_14460 · May 4 · 04:00

Researchers explore quantum neural networks via mixture of experts

Researchers have established a mean-field limit for Mixture of Experts (MoE) models trained using gradient flow in supervised learning scenarios. Their findings demonstrate that as the number of experts increases, the m…
RESEARCH · CL_14045 · May 1 · 17:35

GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing mod…
RESEARCH · CL_14047 · May 1 · 17:11

LightKV reduces LVLM KV cache size and computation by compressing vision tokens

Researchers have developed LightKV, a new method to reduce the GPU memory overhead associated with Large Vision-Language Models (LVLMs). By exploiting redundancy in vision-token embeddings and using prompt-aware guidanc…
RESEARCH · CL_11925 · May 1 · 04:00

FluxMoE system decouples expert weights for faster LLM serving

Researchers have developed FluxMoE, a new system designed to improve the efficiency of serving Mixture-of-Experts (MoE) models. FluxMoE addresses the challenge of large parameter sizes in MoE models by decoupling expert…
RESEARCH · CL_14183 · Apr 30 · 21:35

Study finds switchless networks more cost-effective for MoE LLM serving

A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. T…

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals

Tropical geometry reveals sparsity is combinatorial depth in MoE models

MoLF model predicts pan-cancer gene expression from histology images

LAWS architecture offers self-certifying inference caching for LLMs and robotics

Geometry-aware model advances whole-slide image analysis in computational pathology

NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention

SMoE paper proposes expert substitution for efficient edge MoE deployment

Apple researchers unveil SpecMD for faster MoE model inference

RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters

OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts

RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts

Attention Sink research reveals inherent MoE structure in LLM attention layers

Xiaomi unveils MiMo-V2.5-Pro AI model for automated programming tasks

Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing

Researchers explore quantum neural networks via mixture of experts

GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

LightKV reduces LVLM KV cache size and computation by compressing vision tokens

FluxMoE system decouples expert weights for faster LLM serving

Study finds switchless networks more cost-effective for MoE LLM serving