mixture of experts
PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.
- instance of arXiv 90%
- instance of Innu-aimun 90%
- instance of DeepSeek V4-Flash 90%
- uses large-language models 80%
- used by large-language models 80%
- instance of transformers 70%
- uses LLM 70%
- used by transformer 70%
- instance of transformer 70%
- used by Emo 70%
- developed by Emo 70%
- competes with transformers 50%
- 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. 来源
12 天有情绪数据
-
Meta releases Llama 4 with Mixture of Experts architecture
Meta has released Llama 4 in April 2025, featuring a new Mixture of Experts (MoE) architecture. Two variants, Scout and Maverick, are available, with Scout serving as a balanced default and Maverick offering broader kno…
-
ZipMoE system enables efficient on-device serving of large language models
Researchers have developed ZipMoE, a system designed to make Mixture-of-Experts (MoE) large language models more efficient for on-device deployment. ZipMoE utilizes lossless compression and a cache-affinity scheduling a…
-
Fireworks AI flags numerical drift in LLM training vs. serving
Fireworks AI has identified critical numerical parity bugs that can arise when training and serving large language models, particularly Mixture-of-Experts (MoE) architectures. These discrepancies, stemming from the non-…
-
Alibaba's Qwen3-Coder-Next achieves 70.6 on SWE-Bench with sparse MoE
Alibaba's Qwen3-Coder-Next, an 80 billion parameter model with 3 billion active parameters, has achieved a 70.6 score on the SWE-Bench Verified benchmark. This performance is notable as it rivals top closed-source model…
-
New decoding method tackles hallucinations in vision-language models
Researchers have developed a new inference-time framework called CHASd to combat hallucinations in Large Vision-Language Models (LVLMs). This method, Contrastive Hallucination-Aware Step-wise Decoding, selectively activ…
-
Research quantifies LLM performance, energy, and privacy trade-offs on mobile devices
A new research paper explores the trade-offs between performance, energy consumption, and privacy when running large language models on mobile devices. The study developed an experimental pipeline to measure these facto…
-
New research tackles continual learning in LLMs with novel MoE methods
Two new research papers propose novel approaches to continual learning in large language and vision-language models, aiming to mitigate catastrophic forgetting. CP-MoE introduces a transient expert to guide updates and …
-
SpikingMoE integrates Mixture-of-Experts into spike-driven Transformers
Researchers have introduced SpikingMoE, a novel framework that combines Spiking Neural Networks (SNNs) with a Mixture-of-Experts (MoE) architecture. This approach utilizes a spike-driven prompt (SDprompt) for biological…
-
FAME framework uses LLMs for efficient log anomaly detection
Researchers have developed FAME, a novel framework for message-level log anomaly detection that significantly reduces the need for manual labeling. This system utilizes a Mixture-of-Experts approach, employing large lan…
-
OpenAI o3 disproves conjecture, eyes $850B IPO; Cohere releases MoE model
OpenAI's latest model, o3, has reportedly disproven an Erdős conjecture through extensive reasoning. Concurrently, OpenAI is rumored to be preparing for an IPO with a valuation of $850 billion. In related news, Cohere h…
-
AI efficiency vs. interpretability: a sparse vs. dense tradeoff
The human brain's extreme energy efficiency, estimated to be 10,000 times greater than current AI models, is attributed to its sparse and localized processing. While techniques like mixture-of-experts offer a path towar…
-
New research enables efficient hyperparameter transfer for large neural networks
Researchers have developed new methods for hyperparameter transfer, enabling more efficient scaling of large neural networks. One paper introduces a parameterization justified by dynamical mean-field theory, allowing re…
-
FedCoE framework balances generalization and personalization in Federated Learning
Researchers have introduced FedCoE, a novel framework for Federated Learning that aims to balance global generalization with local personalization. Unlike traditional methods that struggle with non-IID data or overfit t…
-
New tool DODOCO reveals flaws in MoE model dispatch benchmarks
A new research paper introduces DODOCO, a tool designed to diagnose overhead in dispatch operations for Mixture-of-Experts (MoE) models. The study found that common assumptions about workload representation in benchmark…
-
New HDMoE framework enhances cancer survival prediction with multimodal data
Researchers have developed a new framework called HDMoE to improve multimodal cancer survival prediction. This hierarchical decoupling-fusion mixture-of-experts approach aims to better integrate data from sources like w…
-
Dynamic TMoE framework improves time series forecasting with adaptive experts
Researchers have developed Dynamic TMoE, a novel framework designed to improve non-stationary time series forecasting. This approach addresses the limitations of existing Mixture-of-Experts (MoE) models by dynamically a…
-
Vision MoE models show stable animate-inanimate expert specialization
Researchers have developed new methods to analyze the internal workings of Mixture-of-Experts (MoE) models in computer vision. Their work moves beyond simply examining how data is routed to specific "experts" within the…
-
New MoE framework enhances brain decoding with network-aware experts
Researchers have developed FPED, a novel Mixture-of-Experts (MoE) framework designed for interpretable brain decoding using fMRI data. This approach explicitly models different functional brain networks as specialized e…
-
DeepSeek V4 debuts with MegaMoE optimizations for efficient MoE
DeepSeek has released its V4 model, featuring significant optimizations through a new system called MegaMoE. This system utilizes a 1400-line fused CUDA kernel to enhance performance by fine-grained pipelining of commun…
-
New $\phi$-balancing framework improves MoE model training
Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-leve…