Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 1w

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

Researchers have developed RouteScan, a novel framework for auditing the safety of Mixture-of-Experts (MoE) Large Language Models (LLMs) without needing access to sensitive user data. This non-intrusive method analyzes low-level GPU execution telemetry, specifically the patterns of expert routing, to detect harmful behaviors. Evaluations on open-source MoE models show RouteScan achieves high generalization and accuracy, even on unseen harmful domains and novel jailbreak techniques, while offering a privacy advantage over content-based auditing. AI

IMPACT Offers a privacy-preserving method for LLM safety auditing, potentially enabling broader deployment of MoE models.

RouteScan
Mixture-of-Experts (MoE) LLMs