Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition
Researchers have developed a new adversarial attack method for automatic speech recognition (ASR) systems that operates in the feature space rather than directly on audio waveforms. This approach, termed the Clean-Referenced Feature-Vocoder Attack, aims to improve transferability to black-box ASR models and bypass defenses targeting waveform perturbations. By manipulating self-supervised learning representations and reconstructing them via a vocoder, the attack achieved a significant increase in Word Error Rate (WER) on various ASR models, highlighting a vulnerability in current robustness evaluations. AI
IMPACT This research reveals a new vulnerability in ASR systems, potentially impacting the security and reliability of speech-to-text technologies.