ExFusion method enhances Transformer training efficiency via multi-expert fusion

By PulseAugur Editorial · [1 sources] · 2026-07-03 04:00

Researchers have developed ExFusion, a novel pre-training approach designed to enhance the efficiency of Transformer models. This method upcycles the feed-forward network (FFN) into a multi-expert configuration during initialization, assigning weights for later parameter fusion. During training, these experts are fused into a single unified expert, minimizing computational cost compared to standard dense training. After training, the fused expert eliminates additional storage and deployment overhead, while experimental results across computer vision and natural language processing tasks demonstrate its effectiveness. AI

IMPACT This method could lead to more efficient training of large AI models, reducing computational costs and deployment overhead.

RANK_REASON The cluster contains a research paper detailing a new method for training Transformer models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

ExFusion method enhances Transformer training efficiency via multi-expert fusion

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Jiacheng Ruan, Daize Dong, Xiaoye Qu, Tong Zhu, Ting Liu, Yuzhuo Fu, Yu Cheng, Suncheng Xiang · 2026-07-03 04:00

ExFusion: Efficient Transformer Training via Multi-Experts Fusion

arXiv:2603.27965v2 Announce Type: replace Abstract: Mixture-of-Experts (MoE) models substantially improve performance by increasing the capacity of dense architectures. However, directly training MoE models requires considerable computational resources and introduces extra overhe…

COVERAGE [1]

ExFusion: Efficient Transformer Training via Multi-Experts Fusion

RELATED ENTITIES

RELATED TOPICS