实体 mixture of experts

mixture of experts

PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

90 天内 74

发布 · 30天

90 天内 0

论文 · 30天

90 天内 59

层级分布 · 90 天

frontier release 4
significant 3
research 30
tool 36
commentary 1

关系

instance of arXiv 90%
instance of Innu-aimun 90%
instance of DeepSeek V4-Flash 90%
uses large-language models 80%
used by large-language models 80%
instance of transformers 70%
uses LLM 70%
used by transformer 70%
instance of transformer 70%
used by Emo 70%
developed by Emo 70%
competes with transformers 50%

时间线

2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. 来源

情绪 · 30 天

12 天有情绪数据

最近 · 第 2/4 页 · 共 74 条

RESEARCH · CL_36345 · May 14 · 20:39

New $\phi$-balancing framework improves MoE model training

Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-leve…
RESEARCH · CL_32718 · May 14 · 02:48

MetaMoE unifies private MoE models using public proxy data

Researchers have introduced MetaMoE, a novel framework designed to unify independently trained Mixture-of-Experts (MoE) models without requiring access to private client data. The system utilizes public proxy data to ap…
COMMENTARY · CL_29758 · May 13 · 09:03

MoE architectures are workarounds for LLM training instability, not ideal solutions

Mixture-of-Experts (MoE) architectures are often presented as an efficient solution for scaling large language models, but this analysis argues they are primarily a workaround for training instability in dense transform…
RESEARCH · CL_28307 · May 11 · 17:58

New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling

Researchers are exploring new methods to optimize Sparse Mixture-of-Experts (SMoE) models, which are crucial for scaling large language models efficiently. One paper reveals a geometric coupling between routers and expe…
TOOL · CL_27710 · May 11 · 10:33

New MoE framework speeds up time series forecasting training

Researchers have developed a new Mixture-of-Experts (MoE) framework designed to accelerate the training of time series forecasting models. This method integrates expert-specific loss information directly into the traini…
RESEARCH · CL_25314 · May 10 · 18:50

EMO AI Model Achieves High Performance with Minimal Experts

Researchers from the Allen Institute for AI and UC Berkeley have developed a new Mixture-of-Experts (MoE) model architecture named EMO. This model achieves nearly full performance while utilizing only 12.5% of its avail…
SIGNIFICANT · CL_23645 · May 9 · 00:10

DeepSeek releases open-source coding model matching GPT-4o

DeepSeek has released V3-0324, an open-source coding model that matches or surpasses leading models like GPT-4o and Claude 3.5 Sonnet in coding performance. This Mixture-of-Experts model, with 671 billion total paramete…
RESEARCH · CL_25612 · May 8 · 13:08

New research explores speculative decoding for faster LLM inference

Multiple research papers published on arXiv explore advancements in speculative decoding for Large Language Models (LLMs). These studies focus on improving inference speed and efficiency by using a smaller "draft" model…
TOOL · CL_25610 · May 8 · 05:26

MoE models misroute tokens on complex reasoning tasks, study finds

Researchers have identified a significant issue in Mixture-of-Experts (MoE) language models where the routing mechanism, which directs tokens to specific experts, often selects suboptimal paths. While the standard route…
TOOL · CL_22046 · May 8 · 04:00

New MoE inference design uses pooled HBM to cut communication latency on Ascend

Researchers have developed a new communication design for Mixture-of-Experts (MoE) inference on Ascend systems, aiming to reduce bottlenecks in token exchange. This approach eliminates intermediate relay and reordering …
TOOL · CL_21909 · May 8 · 04:00

Graph Normalization offers differentiable approximation for NP-hard MWIS problem

Researchers have developed Graph Normalization (GN), a novel dynamical system that approximates the NP-hard Maximum Weight Independent Set (MWIS) problem. GN offers a principled and differentiable approach, converging t…
TOOL · CL_21907 · May 8 · 04:00

New research explores finite expert banks for communication-efficient MoE architectures

Researchers have developed a new framework for analyzing sparse Mixture-of-Experts (MoE) architectures, focusing on communication efficiency. They propose treating the MoE gate as a stochastic channel and quantifying ro…
RESEARCH · CL_22189 · May 7 · 17:59

EMO model enables modularity in large language models with selective expert use

Researchers have developed EMO, a novel Mixture-of-Experts (MoE) model designed for emergent modularity. Unlike traditional monolithic large language models, EMO activates only specific subsets of its parameters for dif…
RESEARCH · CL_21995 · May 7 · 15:45

New SAMoE-C method improves CSI-based HAR with scene-adaptive experts

Researchers have developed a new method called Scene-Adaptive Mixture of Experts with Clustered Specialists (SAMoE-C) to improve human activity recognition using channel state information (CSI). This approach addresses …
RESEARCH · CL_21794 · May 7 · 15:23

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…
TOOL · CL_20870 · May 7 · 05:44

Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals

Zyphra AI has released ZAYA1-8B, a Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained on AMD hardware, this model demonstrates competitive performance ag…
TOOL · CL_20549 · May 7 · 04:00

Tropical geometry reveals sparsity is combinatorial depth in MoE models

A new paper introduces a theoretical framework for understanding Mixture-of-Experts (MoE) models using tropical geometry. The research establishes that the routing mechanism in MoE architectures is equivalent to a speci…
TOOL · CL_20547 · May 7 · 04:00

MoLF model predicts pan-cancer gene expression from histology images

Researchers have developed MoLF, a novel generative model designed for predicting pan-cancer spatial gene expression from histology images. This model utilizes a conditional Flow Matching objective and a Mixture-of-Expe…
TOOL · CL_20383 · May 7 · 04:00

LAWS architecture offers self-certifying inference caching for LLMs and robotics

Researchers have introduced LAWS, a novel caching architecture designed to improve the efficiency of neural inference, robotics, and edge deployments. This system builds a library of certified expert functions by observ…
RESEARCH · CL_20274 · May 6 · 17:33

Geometry-aware model advances whole-slide image analysis in computational pathology

Researchers have developed BatMIL, a novel framework for analyzing whole-slide histopathological images. This approach utilizes a hybrid hyperbolic-Euclidean representation to better capture hierarchical tissue structur…

New $\phi$-balancing framework improves MoE model training

MetaMoE unifies private MoE models using public proxy data

MoE architectures are workarounds for LLM training instability, not ideal solutions

New research optimizes Sparse Mixture-of-Experts for efficient LLM scaling

New MoE framework speeds up time series forecasting training

EMO AI Model Achieves High Performance with Minimal Experts

DeepSeek releases open-source coding model matching GPT-4o

New research explores speculative decoding for faster LLM inference

MoE models misroute tokens on complex reasoning tasks, study finds

New MoE inference design uses pooled HBM to cut communication latency on Ascend

Graph Normalization offers differentiable approximation for NP-hard MWIS problem

New research explores finite expert banks for communication-efficient MoE architectures

EMO model enables modularity in large language models with selective expert use

New SAMoE-C method improves CSI-based HAR with scene-adaptive experts

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals

Tropical geometry reveals sparsity is combinatorial depth in MoE models

MoLF model predicts pan-cancer gene expression from histology images

LAWS architecture offers self-certifying inference caching for LLMs and robotics

Geometry-aware model advances whole-slide image analysis in computational pathology