mixture of experts
PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.
- instance of arXiv 90%
- instance of Innu-aimun 90%
- instance of DeepSeek V4-Flash 90%
- uses large-language models 80%
- used by large-language models 80%
- instance of transformers 70%
- uses LLM 70%
- used by transformer 70%
- instance of transformer 70%
- used by Emo 70%
- developed by Emo 70%
- competes with transformers 50%
- 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. 来源
12 天有情绪数据
-
New $\phi$-balancing framework improves MoE model training
Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-leve…
-
MetaMoE unifies private MoE models using public proxy data
Researchers have introduced MetaMoE, a novel framework designed to unify independently trained Mixture-of-Experts (MoE) models without requiring access to private client data. The system utilizes public proxy data to ap…
-
MoE architectures are workarounds for LLM training instability, not ideal solutions
Mixture-of-Experts (MoE) architectures are often presented as an efficient solution for scaling large language models, but this analysis argues they are primarily a workaround for training instability in dense transform…
-
New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling
Researchers are exploring new methods to optimize Sparse Mixture-of-Experts (SMoE) models, which are crucial for scaling large language models efficiently. One paper reveals a geometric coupling between routers and expe…
-
New MoE framework speeds up time series forecasting training
Researchers have developed a new Mixture-of-Experts (MoE) framework designed to accelerate the training of time series forecasting models. This method integrates expert-specific loss information directly into the traini…
-
EMO AI Model Achieves High Performance with Minimal Experts
Researchers from the Allen Institute for AI and UC Berkeley have developed a new Mixture-of-Experts (MoE) model architecture named EMO. This model achieves nearly full performance while utilizing only 12.5% of its avail…
-
DeepSeek releases open-source coding model matching GPT-4o
DeepSeek has released V3-0324, an open-source coding model that matches or surpasses leading models like GPT-4o and Claude 3.5 Sonnet in coding performance. This Mixture-of-Experts model, with 671 billion total paramete…
-
New research explores speculative decoding for faster LLM inference
Multiple research papers published on arXiv explore advancements in speculative decoding for Large Language Models (LLMs). These studies focus on improving inference speed and efficiency by using a smaller "draft" model…
-
MoE models misroute tokens on complex reasoning tasks, study finds
Researchers have identified a significant issue in Mixture-of-Experts (MoE) language models where the routing mechanism, which directs tokens to specific experts, often selects suboptimal paths. While the standard route…
-
New MoE inference design uses pooled HBM to cut communication latency on Ascend
Researchers have developed a new communication design for Mixture-of-Experts (MoE) inference on Ascend systems, aiming to reduce bottlenecks in token exchange. This approach eliminates intermediate relay and reordering …
-
Graph Normalization offers differentiable approximation for NP-hard MWIS problem
Researchers have developed Graph Normalization (GN), a novel dynamical system that approximates the NP-hard Maximum Weight Independent Set (MWIS) problem. GN offers a principled and differentiable approach, converging t…
-
New research explores finite expert banks for communication-efficient MoE architectures
Researchers have developed a new framework for analyzing sparse Mixture-of-Experts (MoE) architectures, focusing on communication efficiency. They propose treating the MoE gate as a stochastic channel and quantifying ro…
-
EMO model enables modularity in large language models with selective expert use
Researchers have developed EMO, a novel Mixture-of-Experts (MoE) model designed for emergent modularity. Unlike traditional monolithic large language models, EMO activates only specific subsets of its parameters for dif…
-
New SAMoE-C method improves CSI-based HAR with scene-adaptive experts
Researchers have developed a new method called Scene-Adaptive Mixture of Experts with Clustered Specialists (SAMoE-C) to improve human activity recognition using channel state information (CSI). This approach addresses …
-
New parameter E predicts Mixture-of-Experts model health, preventing dead experts.
Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
-
Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals
Zyphra AI has released ZAYA1-8B, a Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained on AMD hardware, this model demonstrates competitive performance ag…
-
Tropical geometry reveals sparsity is combinatorial depth in MoE models
A new paper introduces a theoretical framework for understanding Mixture-of-Experts (MoE) models using tropical geometry. The research establishes that the routing mechanism in MoE architectures is equivalent to a speci…
-
MoLF model predicts pan-cancer gene expression from histology images
Researchers have developed MoLF, a novel generative model designed for predicting pan-cancer spatial gene expression from histology images. This model utilizes a conditional Flow Matching objective and a Mixture-of-Expe…
-
LAWS architecture offers self-certifying inference caching for LLMs and robotics
Researchers have introduced LAWS, a novel caching architecture designed to improve the efficiency of neural inference, robotics, and edge deployments. This system builds a library of certified expert functions by observ…
-
Geometry-aware model advances whole-slide image analysis in computational pathology
Researchers have developed BatMIL, a novel framework for analyzing whole-slide histopathological images. This approach utilizes a hybrid hyperbolic-Euclidean representation to better capture hierarchical tissue structur…