Acoustic Prompting via Stage-wise Modulation for Few-Shot Learning in Audio Language Models
Researchers have developed a new framework for Audio Language Models (ALMs) that introduces trainable prompts directly into the audio encoder. This approach aims to capture task-specific acoustic features, enhancing few-shot adaptation by complementing existing text-side prompt learning methods. Experiments across 11 datasets indicate that this plug-and-play module generally improves performance when integrated with text prompt tuning, suggesting that explicit modulation of the audio representation space is effective. AI