CLEAR-MoE converts frozen Vision Transformers to sparse MoE models

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have developed CLEAR-MoE, a novel post-training method to transform frozen Vision Transformers (ViTs) into sparse Mixture-of-Experts (MoE) models without altering the original backbone weights. This technique involves a four-phase pipeline that scores and decomposes feed-forward network layers, trains lightweight routers, and dispatches tokens. Experiments on various ViT backbones demonstrated that CLEAR-MoE can retain nearly all of the dense model's accuracy, with the shared singular value decomposition (SVD) basis being crucial for preserving performance. While routing and overhead introduce a slight slowdown in FFN execution, the approach shows promise for efficient MoE model creation. AI

IMPACT Enables efficient creation of sparse Mixture-of-Experts models from existing Vision Transformers without retraining.

RANK_REASON The cluster contains a research paper detailing a new method for converting existing models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

CLEAR-MoE converts frozen Vision Transformers to sparse MoE models

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Md Irtiza Hossain, Humaira Ayesha, Junaid Ahmed Sifat · 2026-06-30 04:00

CLEAR-MoE: Shared-Basis Expert Extraction from Frozen Vision Transformers via Calibration-Driven Layer Selection

arXiv:2606.28516v1 Announce Type: new Abstract: We present CLEAR-MoE, a four-phase post-training pipeline that converts a frozen pretrained Vision Transformer (ViT) into a sparse Mixture-of-Experts (MoE) model without updating backbone weights. The pipeline (i) scores feed-forwar…

COVERAGE [1]

CLEAR-MoE: Shared-Basis Expert Extraction from Frozen Vision Transformers via Calibration-Driven Layer Selection

RELATED ENTITIES

RELATED TOPICS