mixture of experts
PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.
- instance of Mixture of Experts (MoE) 95%
- instance of Emo 95%
- instance of arXiv 90%
- instance of large-language models 90%
- instance of Innu-aimun 90%
- instance of GPT OSS 20B 90%
- used by SGLang 90%
- uses Mamba 90%
- instance of Sparse Mixture of Experts 90%
- instance of DeepSeek MoE 90%
- instance of DeepSeek-V4 Flash 90%
- instance of OLMoE-1B-7B 90%
- 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source
21 day(s) with sentiment data
-
New SharpMoE framework enhances diffusion models with accurate routing
Researchers have developed SharpMoE, a new framework designed to improve the efficiency and performance of Mixture-of-Experts (MoE) diffusion models used in visual generation. The framework addresses a routing assignmen…
-
Study questions modularity of frontier Mixture-of-Experts models
A new study published on arXiv investigates the modularity of Mixture-of-Experts (MoE) models, specifically testing the Command A+ model. The research found that apparent functional modularity in these models is often r…
-
SARA framework enhances multilingual capabilities in Mixture-of-Experts models
Researchers have introduced SARA (Semantically Anchored Routing Alignment), a new framework designed to improve the performance of Mixture-of-Experts (MoE) models in low-resource languages. SARA addresses the issue wher…
-
New method learns domain generalization via subset-shared invariances
Researchers have introduced a new approach to domain generalization called subset-shared invariance, which addresses limitations of current methods that enforce global invariance across all source domains. This new tech…
-
Automated pipeline uncovers bias in MoE4 architecture search
Researchers have developed an automated pipeline to explore heterogeneous 4-Expert Mixture-of-Experts (MoE4) architectures within the LEMUR dataset ecosystem. This pipeline systematically combines base architecture fami…
-
MoE models show mixed inference performance on consumer and edge hardware
A recent study investigated whether Mixture-of-Experts (MoE) language models offer practical inference advantages on consumer and edge hardware. The research found that while MoE models theoretically reduce per-token co…
-
RAVEN model enhances financial forecasting with adaptive context windows
Researchers have introduced RAVEN, a novel Mixture-of-Experts framework designed to improve financial time series forecasting. Unlike traditional models that use fixed context windows, RAVEN adaptively determines the op…
-
New MoE framework integrates diverse architectures for improved plant disease classification
Researchers have developed a novel adaptive soft Mixture-of-Experts (MoE) framework designed to improve plant leaf disease classification. This framework integrates three distinct architectures—EfficientNet-B0, DenseNet…
-
New framework uses hierarchical RL for neural network compression
Researchers have developed HiReLC, a hierarchical reinforcement learning framework designed to jointly quantize and prune deep neural networks. This approach uses low-level agents for per-kernel configurations and high-…
-
New RAD method controls MoE language model reasoning without text analysis
Researchers have developed a new method called RAD (Routing Agreement Decoding) for controlling reasoning in sparse Mixture-of-Experts (MoE) language models. This technique leverages the internal routing states of MoE m…
-
NVIDIA Nemotron 3 Nano: Open Model for Efficient AI Agents
NVIDIA has released Nemotron 3 Nano, a 30-billion parameter open model designed for efficient reasoning and long-context applications. This model utilizes a hybrid Mixture-of-Experts architecture, activating only a frac…
-
NVIDIA unveils efficient Nemotron 3 LLM family with hybrid architecture
NVIDIA has released two new large language models, Nemotron 3 Nano and Nemotron 3 Ultra, focusing on efficiency and advanced capabilities. Nemotron 3 Nano is a 30B-class model designed for private inference and agentic …
-
DeepSeek unveils V4 models with 1M token context and MoE architecture
DeepSeek has released a preview of its DeepSeek-V4 series of Mixture-of-Experts (MoE) language models, featuring DeepSeek-V4-Pro (1.6T parameters) and DeepSeek-V4-Flash (284B parameters). Both models support an unpreced…
-
LLM Cross-Lingual Transfer: Task Alignment Over Linguistic Family
A new research paper explores cross-lingual transfer in large language models, specifically examining Arabic fine-tuning and its impact on Semitic languages. The study found no evidence of Semitic-specific transfer, ind…
-
New framework improves speaker verification for non-verbal vocalizations
Researchers have developed a new framework for speaker verification that improves accuracy for non-verbal vocalizations (NVVs) while preserving performance on speech. The system combines frozen self-supervised features …
-
New research analyzes MoE model calibration and discontinuities · 4 sources tracked
Two new research papers explore the complexities of Mixture-of-Experts (MoE) models, particularly concerning calibration and discontinuities. The first paper investigates how expert-level calibration impacts MoE perform…
-
FoMoE system partitions LLM experts to reduce distributed training costs
Researchers have introduced FoMoE, a novel system designed to overcome the limitations of training large language models (LLMs) across geographically distributed data centers. Unlike previous methods that required full …
-
New research enables editable and composable KV cache for LLMs
A new research paper introduces a novel method for optimizing KV cache usage in large language models, enabling editable and composable notes within the prefill stage. This approach allows for efficient editing of model…
-
Mixture of Experts (MoE) enhances AI model inference speed
Mixture of Experts (MoE) is presented as a solution to slow model inference times. By optimizing token routing, MoE architectures can effectively scale to handle increased request volumes. This approach aims to improve …
-
SoftMoE introduces differentiable routing for Mixture-of-Experts LLMs
Researchers have introduced SoftMoE, a novel approach to Mixture-of-Experts (MoE) architectures for Large Language Models (LLMs). Unlike traditional sparse MoE models that use a non-differentiable top-k routing mechanis…