PulseAugur
EN
LIVE 11:04:08

RouteScan audits MoE LLM safety using non-intrusive routing telemetry

Researchers have developed RouteScan, a novel framework for auditing the safety of Mixture-of-Experts (MoE) Large Language Models (LLMs) without needing access to sensitive user data. This non-intrusive method analyzes low-level GPU execution telemetry, specifically the patterns of expert routing, to detect harmful behaviors. Evaluations on open-source MoE models show RouteScan achieves high generalization and accuracy, even on unseen harmful domains and novel jailbreak techniques, while offering a privacy advantage over content-based auditing. AI

IMPACT Offers a privacy-preserving method for LLM safety auditing, potentially enabling broader deployment of MoE models.

RANK_REASON The cluster contains a research paper detailing a new method for auditing LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Bo Lv, Zhiheng Xu, KeDong Xiu, Ruyi Ding, Tianhang Zheng, Zhibo Wang, Kui Ren ·

    RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

    arXiv:2605.24817v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) architectures have become an increasingly important paradigm for scaling Large Language Models (LLMs). As MoE models are increasingly deployed in real-world services, safety auditing becomes necessary to v…