mixture of experts
PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.
- 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source
4 day(s) with sentiment data
-
RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters
Researchers have developed RD-ViT, a novel Recurrent-Depth Vision Transformer designed for semantic segmentation tasks. This architecture significantly reduces data dependence by using a single, shared transformer block…
-
OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts
Researchers have developed a new event-based visual object tracking framework that addresses limitations of existing methods by explicitly modeling event density variations across multiple temporal scales. This approach…
-
RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts
Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…
-
Attention Sink research reveals inherent MoE structure in LLM attention layers
Researchers have identified that the attention sink phenomenon in Large Language Models, where the first token receives disproportionate attention, naturally forms a Mixture-of-Experts (MoE) mechanism within attention l…
-
Xiaomi unveils MiMo-V2.5-Pro AI model for automated programming tasks
Xiaomi has unveiled its MiMo-V2.5-Pro language model, designed to automate complex programming tasks. Leveraging a Mixture-of-Experts architecture and reduced token requirements, the model can handle processes that prev…
-
Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing
Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE)…
-
Researchers explore quantum neural networks via mixture of experts
Researchers have established a mean-field limit for Mixture of Experts (MoE) models trained using gradient flow in supervised learning scenarios. Their findings demonstrate that as the number of experts increases, the m…
-
GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer
Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing mod…
-
FluxMoE system decouples expert weights for faster LLM serving
Researchers have developed FluxMoE, a new system designed to improve the efficiency of serving Mixture-of-Experts (MoE) models. FluxMoE addresses the challenge of large parameter sizes in MoE models by decoupling expert…
-
Study finds switchless networks more cost-effective for MoE LLM serving
A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. T…
-
Mixture of Experts framework speeds up atomistic simulations
Researchers have developed a new Mixture-of-Experts (MoE) framework for Machine Learning Interatomic Potentials (MLIPs) to accelerate atomistic simulations. This approach divides simulation domains into regions of varyi…
-
Liquid AI releases LFM2-24B-A2B, an efficient 24B parameter MoE model
Liquid AI has released an early checkpoint of its LFM2-24B-A2B model, a sparse Mixture of Experts (MoE) architecture with 24 billion total parameters and 2 billion active parameters per token. This model demonstrates th…
-
FaaSMoE offers resource-efficient, serverless serving for multi-tenant Mixture-of-Experts models.
Researchers have developed FaaSMoE, a novel serverless framework designed for serving Mixture-of-Experts (MoE) models in multi-tenant environments. This architecture deploys individual experts as stateless functions on …
-
New framework uses physics-informed transfer learning for multi-site emission control
Researchers have developed a new physics-informed transfer learning framework designed to improve emission control in municipal solid waste incineration. This framework utilizes a mixture-of-experts model to manage carb…
-
Mixture-of-Experts model applied to GlueX DIRC detector for physics analysis
Researchers have developed a Mixture-of-Experts (MoE) foundation model to streamline data analysis for the GlueX DIRC detector at Jefferson Lab. This unified framework handles fast simulation, particle identification, a…
-
NVIDIA launches Nemotron 3 Nano Omni multimodal AI model for agents
NVIDIA has released Nemotron 3 Nano Omni, a multimodal large language model capable of processing vision, audio, video, and text simultaneously. This open model, built on a Mamba2 Transformer Hybrid Mixture of Experts a…
-
Poolside AI releases open-weight Laguna XS.2 and M.1 coding models
Poolside AI has released two new agentic coding models, Laguna M.1 and Laguna XS.2, along with their agent training and operation runtime. Laguna M.1 is a large Mixture of Experts (MoE) model trained on 30T tokens using…
-
NVIDIA launches Nemotron 3 Nano Omni, unifying multimodal AI for efficiency
NVIDIA has released Nemotron 3 Nano Omni, an open multimodal model capable of processing text, images, audio, and video. This model aims to unify these modalities into a single architecture, improving efficiency and ena…
-
AI models achieve 10x intelligence gains via Mixture of Experts and Transformer architectures
The Transformer architecture, introduced in the paper "Attention Is All You Need," revolutionized AI by enabling models to process information more efficiently. This innovation is key to understanding how models like Op…
-
New framework uses multiple LLMs to reduce hallucination and bias
Researchers have developed a new framework called Council Mode designed to mitigate hallucinations and biases in Large Language Models. This approach involves querying multiple diverse LLMs simultaneously and then synth…