PulseAugur
EN
LIVE 21:38:37
ENTITY mixture of experts

mixture of experts

PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
100
100 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
76
76 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
TIMELINE
  1. 2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. source
SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 4/5 · 100 TOTAL
  1. RESEARCH · CL_21794 ·

    New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

    Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…

  2. TOOL · CL_20870 ·

    Zyphra's ZAYA1-8B MoE model trained on AMD hardware outperforms larger rivals

    Zyphra AI has released ZAYA1-8B, a Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained on AMD hardware, this model demonstrates competitive performance ag…

  3. TOOL · CL_20549 ·

    Tropical geometry reveals sparsity is combinatorial depth in MoE models

    A new paper introduces a theoretical framework for understanding Mixture-of-Experts (MoE) models using tropical geometry. The research establishes that the routing mechanism in MoE architectures is equivalent to a speci…

  4. TOOL · CL_20547 ·

    MoLF model predicts pan-cancer gene expression from histology images

    Researchers have developed MoLF, a novel generative model designed for predicting pan-cancer spatial gene expression from histology images. This model utilizes a conditional Flow Matching objective and a Mixture-of-Expe…

  5. TOOL · CL_20383 ·

    LAWS architecture offers self-certifying inference caching for LLMs and robotics

    Researchers have introduced LAWS, a novel caching architecture designed to improve the efficiency of neural inference, robotics, and edge deployments. This system builds a library of certified expert functions by observ…

  6. RESEARCH · CL_20274 ·

    Geometry-aware model advances whole-slide image analysis in computational pathology

    Researchers have developed BatMIL, a novel framework for analyzing whole-slide histopathological images. This approach utilizes a hybrid hyperbolic-Euclidean representation to better capture hierarchical tissue structur…

  7. RESEARCH · CL_18472 ·

    NVIDIA open-sources cuDNN kernels after 12 years, including MoE and sparse attention

    NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codeb…

  8. TOOL · CL_18630 ·

    SMoE paper proposes expert substitution for efficient edge MoE deployment

    Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts an…

  9. TOOL · CL_20119 ·

    Apple researchers unveil SpecMD for faster MoE model inference

    Apple's machine learning research team has published a paper detailing SpecMD, a new framework for evaluating Mixture-of-Experts (MoE) model caching policies. Their experiments show that traditional caching assumptions …

  10. RESEARCH · CL_18667 ·

    RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters

    Researchers have developed RD-ViT, a novel Recurrent-Depth Vision Transformer designed for semantic segmentation tasks. This architecture significantly reduces data dependence by using a single, shared transformer block…

  11. RESEARCH · CL_18682 ·

    OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts

    Researchers have developed a new event-based visual object tracking framework that addresses limitations of existing methods by explicitly modeling event density variations across multiple temporal scales. This approach…

  12. TOOL · CL_16235 ·

    RAST-MoE-RL framework enhances ride-hailing efficiency with specialized AI experts

    Researchers have developed a new framework called RAST-MoE-RL to improve efficiency in ride-hailing services. This framework utilizes a Mixture-of-Experts (MoE) approach within deep reinforcement learning to better hand…

  13. TOOL · CL_15969 ·

    Attention Sink research reveals inherent MoE structure in LLM attention layers

    Researchers have identified that the attention sink phenomenon in Large Language Models, where the first token receives disproportionate attention, naturally forms a Mixture-of-Experts (MoE) mechanism within attention l…

  14. RESEARCH · CL_14912 ·

    Xiaomi unveils MiMo-V2.5-Pro AI model for automated programming tasks

    Xiaomi has unveiled its MiMo-V2.5-Pro language model, designed to automate complex programming tasks. Leveraging a Mixture-of-Experts architecture and reduced token requirements, the model can handle processes that prev…

  15. RESEARCH · CL_15510 ·

    Mamoda2.5 model integrates multimodal AI with efficient DiT-MoE for top video editing

    Researchers have introduced Mamoda2.5, a unified AR-Diffusion framework designed for multimodal understanding and generation. This model utilizes a Diffusion Transformer backbone enhanced with a Mixture-of-Experts (MoE)…

  16. RESEARCH · CL_14460 ·

    Researchers explore quantum neural networks via mixture of experts

    Researchers have established a mean-field limit for Mixture of Experts (MoE) models trained using gradient flow in supervised learning scenarios. Their findings demonstrate that as the number of experts increases, the m…

  17. RESEARCH · CL_14045 ·

    GMGaze model achieves SOTA gaze estimation with CLIP and multiscale transformer

    Researchers have introduced GMGaze, a novel approach to gaze estimation that utilizes a multi-scale transformer architecture and incorporates context-aware conditioning. This method addresses limitations in existing mod…

  18. RESEARCH · CL_14047 ·

    LightKV reduces LVLM KV cache size and computation by compressing vision tokens

    Researchers have developed LightKV, a new method to reduce the GPU memory overhead associated with Large Vision-Language Models (LVLMs). By exploiting redundancy in vision-token embeddings and using prompt-aware guidanc…

  19. RESEARCH · CL_11925 ·

    FluxMoE system decouples expert weights for faster LLM serving

    Researchers have developed FluxMoE, a new system designed to improve the efficiency of serving Mixture-of-Experts (MoE) models. FluxMoE addresses the challenge of large parameter sizes in MoE models by decoupling expert…

  20. RESEARCH · CL_14183 ·

    Study finds switchless networks more cost-effective for MoE LLM serving

    A new paper analyzes network topologies for Mixture-of-Experts (MoE) Large Language Model (LLM) serving, finding that lower-cost, switchless networks can be more cost-effective than expensive scale-up infrastructures. T…