DeepSeek MoE
PulseAugur coverage of DeepSeek MoE — every cluster mentioning DeepSeek MoE across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
Mixture of Experts: Performance Gains with Memory Trade-offs
Mixture of Experts (MoE) models offer a way to achieve high performance with lower computational cost per token by activating only a subset of their parameters. While models like Mixtral 8x7B, DeepSeek-MoE, and Qwen2.5-…
-
JetBrains releases efficient Mellum2 MoE model; research advances MoE techniques
JetBrains has released Mellum2, an open-source 12-billion parameter Mixture-of-Experts (MoE) model optimized for efficient inference in text and code tasks. This model activates only a fraction of its parameters per tok…
-
HEAPr algorithm precisely prunes LLM experts, cutting memory needs
Researchers have developed HEAPr, a new pruning algorithm designed to reduce the memory footprint of Mixture-of-Experts (MoE) large language models. Unlike previous methods that prune entire experts, HEAPr breaks down e…
-
New framework enhances MoE LLMs on noisy analog hardware
Researchers have introduced ROMER, a post-training calibration framework designed to enhance the robustness of Mixture-of-Experts (MoE) Large Language Models (LLMs) when deployed on analog Compute-in-Memory (CIM) system…