Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 2w

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

Researchers have developed HEAPr, a new pruning algorithm designed to reduce the memory footprint of Mixture-of-Experts (MoE) large language models. Unlike previous methods that prune entire experts, HEAPr breaks down experts into smaller, atomic units. This approach utilizes second-order information from atomic expert outputs, significantly reducing computational complexity and enabling more precise compression with minimal accuracy loss. Experiments on models like DeepSeek MoE and Qwen MoE show HEAPr can achieve nearly lossless compression up to 25% pruning, reducing FLOPs by a similar margin. AI

IMPACT Enables more efficient deployment of large MoE models by reducing memory requirements without significant performance degradation.
TOOL · arXiv cs.CL English(EN) · 1mo

ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems

Researchers have introduced ROMER, a post-training calibration framework designed to enhance the robustness of Mixture-of-Experts (MoE) Large Language Models (LLMs) when deployed on analog Compute-in-Memory (CIM) systems. This framework addresses hardware imperfections in CIM by replacing underutilized experts and recalibrating router decisions to maintain load balance and optimal routing under noisy conditions. Experiments show ROMER significantly reduces perplexity for models like DeepSeek-MoE, Qwen-MoE, and OLMoE when subjected to real-chip noise. AI

IMPACT Improves the viability of deploying LLMs on energy-efficient analog hardware by mitigating noise-induced performance degradation.
- LLMs
- ROMER
- CIM
- DeepSeek-MoE
- Qwen-MoE
- OLMoE

Brief

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems