New LoMC Framework Enhances Refusal Suppression in Routed Foundation Models

By PulseAugur Editorial · [2 sources] · 2026-06-10 08:02

Researchers have developed a new framework called Localized Multidirectional Correction (LoMC) to address refusal suppression in routed Mixture-of-Experts (MoE) and hybrid-MoE foundation models. LoMC aims to enhance non-refusal responses while preserving overall capabilities by applying targeted corrections within specific model components. This method involves identifying an edit support, aggregating correction directions, and applying rank-one layer-wise corrections only within that support, thereby increasing correction capacity without broadening the intervention scope. Experiments on various safety benchmarks have demonstrated LoMC's effectiveness in improving desired behaviors across different routed model architectures. AI

IMPACT Introduces a novel technique for improving safety and control in complex routed AI models.

RANK_REASON The cluster contains an academic paper detailing a new method for AI model safety.

Read on arXiv stat.ML →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New LoMC Framework Enhances Refusal Suppression in Routed Foundation Models

COVERAGE [2]

arXiv stat.ML TIER_1 English(EN) · Yan Hong, Kedong Xiu, Wei Li, Jun Lan, Huijia Zhu, Shuheng Zhou, Zhongcai Lyu, Weiqiang Wang, Jianfu Zhang · 2026-06-15 04:00

LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models

arXiv:2606.13709v1 Announce Type: new Abstract: We study controlled post-training refusal suppression in routed MoE and hybrid-MoE foundation models, aiming to increase non-refusal target-response behavior while preserving general capability under a compact intervention footprint…
arXiv stat.ML TIER_1 English(EN) · Jianfu Zhang · 2026-06-10 08:02

LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models

We study controlled post-training refusal suppression in routed MoE and hybrid-MoE foundation models, aiming to increase non-refusal target-response behavior while preserving general capability under a compact intervention footprint. Existing broad direction-based edits can pertu…

COVERAGE [2]

LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models

LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models

RELATED ENTITIES

RELATED TOPICS