实体 mixture of experts

mixture of experts

PulseAugur coverage of mixture of experts — every cluster mentioning mixture of experts across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

90 天内 74

发布 · 30天

90 天内 0

论文 · 30天

90 天内 59

层级分布 · 90 天

frontier release 4
significant 3
research 30
tool 36
commentary 1

关系

instance of arXiv 90%
instance of Innu-aimun 90%
instance of DeepSeek V4-Flash 90%
uses large-language models 80%
used by large-language models 80%
instance of transformers 70%
uses LLM 70%
used by transformer 70%
instance of transformer 70%
used by Emo 70%
developed by Emo 70%
competes with transformers 50%

时间线

2026-05-11 research_milestone A new paper proposes an enhanced Mixture-of-Experts framework for faster time series forecasting model training. 来源

情绪 · 30 天

12 天有情绪数据

最近 · 第 4/4 页 · 共 74 条

RESEARCH · CL_07734 · Apr 28 · 16:17

Poolside AI releases open-weight Laguna XS.2 and M.1 coding models

Poolside AI has released two new agentic coding models, Laguna M.1 and Laguna XS.2, along with their agent training and operation runtime. Laguna M.1 is a large Mixture of Experts (MoE) model trained on 30T tokens using…
FRONTIER RELEASE · CL_07750 · Apr 28 · 16:09

NVIDIA 发布 Nemotron 3 Nano Omni 多模态 AI 模型，用于智能体

NVIDIA 发布了 Nemotron 3 Nano Omni，这是一款能够同时处理视觉、音频、视频和文本的多模态大型语言模型。该开放模型基于 Mamba2 Transformer 混合专家模型架构构建，旨在通过实现单一多模态理解推理循环来增强企业智能体工作流程。它现已在 Fireworks 和 Amazon SageMaker JumpStart 上提供，提供 131K 的上下文长度，并获得商业使用许可。
RESEARCH · CL_07230 · Apr 28 · 08:00

AI models achieve 10x intelligence gains via Mixture of Experts and Transformer architectures

The Transformer architecture, introduced in the paper "Attention Is All You Need," revolutionized AI by enabling models to process information more efficiently. This innovation is key to understanding how models like Op…
RESEARCH · CL_06713 · Apr 28 · 04:00

New framework uses multiple LLMs to reduce hallucination and bias

Researchers have developed a new framework called Council Mode designed to mitigate hallucinations and biases in Large Language Models. This approach involves querying multiple diverse LLMs simultaneously and then synth…
RESEARCH · CL_06701 · Apr 28 · 04:00

New simulator stress-tests AI emotional support chatbots with diverse user profiles

Researchers have developed a new controllable simulator to better evaluate emotional support chatbots. This simulator addresses limitations in current systems by incorporating diverse psychological and linguistic featur…
FRONTIER RELEASE · CL_07710 · Apr 27 · 19:49

NVIDIA 发布 Nemotron 3 Nano Omni，统一多模态 AI 以提高效率

NVIDIA 发布了 Nemotron 3 Nano Omni，这是一个开放的多模态模型，能够处理文本、图像、音频和视频。该模型旨在将这些模态统一到单一架构中，从而提高效率并实现更复杂的人工智能智能体。Nemotron 3 Nano Omni 在文档智能、音频理解和视频分析的基准测试中表现出色，与之前的模型和替代方案相比，在吞吐量和推理速度方面均有显著提升。
RESEARCH · CL_06215 · Apr 27 · 03:23

SMoES improves MoE-VLM efficiency and effectiveness with soft modality guidance

Researchers have introduced SMoES, a novel approach for guiding expert routing in Mixture-of-Experts (MoE) vision-language models (VLMs). This method utilizes dynamic soft modality scores to account for layer-dependent …
RESEARCH · CL_03609 · Apr 24 · 16:44

Researchers propose new methods to decouple model parameters from computation

Researchers have introduced novel methods to decouple model size from computational cost in deep learning. One approach, 'hash layers,' allows for larger models with fewer computational operations by using hashing for e…
FRONTIER RELEASE · CL_00752 · Apr 24 · 13:30

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

DeepSeek has released its V4 AI model, featuring two versions: V4-Pro and V4-Flash. These models boast a 1 million token context window and utilize a mixture-of-experts architecture for efficiency. While DeepSeek V4 aim…
RESEARCH · CL_05070 · Apr 24 · 11:35

AI research explores functorial formulations, causal learning, and adaptive model merging

Researchers have developed a multi-fidelity surrogate modeling framework to predict wind loads on container ships, combining empirical data with CFD simulations for improved accuracy and reduced computational cost. Anot…
RESEARCH · CL_02843 · Apr 22 · 13:37

New MoE Architectures Enhance Efficiency and Performance

Researchers are developing advanced techniques to improve Mixture-of-Experts (MoE) models, particularly addressing challenges in domain transitions and inference efficiency. One approach, inspired by the Free Energy Pri…
RESEARCH · CL_03266 · Apr 22 · 00:05

Cohere details how MoE models boost speculative decoding effectiveness

Cohere has released a technical report detailing how Mixture-of-Experts (MoE) models can enhance speculative decoding. Contrary to initial expectations, the research indicates that MoE architectures actually improve the…
FRONTIER RELEASE · CL_47594 · Apr 13 · 09:12

Qwen 发布 27B 多模态模型，用于高级编码

Qwen 发布了 Qwen3.6-27B，这是一个拥有 270 亿参数的密集多模态模型，专为高级编码任务设计。该模型旨在提供旗舰级的智能体编码性能，超越了此前该类别中的开源模型。社区成员已经发布了 Qwen3.6-27B 的不同量化版本，可在 Hugging Face 上获取，方便其在不同平台和库中使用。
FRONTIER RELEASE · CL_00821 · Jan 19 · 04:00

DeepSeek v3 leads open-weight models, Baseten enables mission-critical inference

DeepSeek v3, a new 671B parameter Mixture-of-Experts model, has been released and is currently the top-performing open-weights model available. Serving such large models presents significant challenges, but inference st…

Poolside AI releases open-weight Laguna XS.2 and M.1 coding models

NVIDIA 发布 Nemotron 3 Nano Omni 多模态 AI 模型，用于智能体

AI models achieve 10x intelligence gains via Mixture of Experts and Transformer architectures

New framework uses multiple LLMs to reduce hallucination and bias

New simulator stress-tests AI emotional support chatbots with diverse user profiles

NVIDIA 发布 Nemotron 3 Nano Omni，统一多模态 AI 以提高效率

SMoES improves MoE-VLM efficiency and effectiveness with soft modality guidance

Researchers propose new methods to decouple model parameters from computation

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

AI research explores functorial formulations, causal learning, and adaptive model merging

New MoE Architectures Enhance Efficiency and Performance

Cohere details how MoE models boost speculative decoding effectiveness

Qwen 发布 27B 多模态模型，用于高级编码

DeepSeek v3 leads open-weight models, Baseten enables mission-critical inference