Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification
Researchers have developed a new method called Drift-Augmented Scoring (DAS) to improve the robustness of zero-shot audio-language classification models against acoustic noise. This technique adds a small bonus to the cosine score, rewarding classes when noisy audio embeddings align with noise-conditioned text prompts. DAS demonstrated significant improvements, increasing accuracy by up to 5.75 points on UrbanSound8K and mAP by up to 1.74 points on FSD50K, outperforming other methods in various noisy conditions. AI
IMPACT Enhances the reliability of audio-language models in real-world noisy environments, potentially improving applications like voice assistants and content moderation.