Detecting Trojaned DNNs via Spectral Regression Analysis
Researchers have developed MIST, a novel method for detecting malicious Trojans embedded in deep neural networks during fine-tuning. This approach analyzes the spectral changes in a model's internal representations during updates, treating Trojan detection as a regression problem. MIST effectively distinguishes between benign model evolution and Trojaned updates by identifying spectral deviations inconsistent with normal behavior, outperforming existing methods without needing knowledge of the poison data or trigger. AI
IMPACT Introduces a new technique for securing AI models against sophisticated poisoning attacks during development.