Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 6h

Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models

Researchers have identified a significant issue in audio-language models where conflicting text inputs override clear audio evidence, leading to incorrect outputs. A study found that in over 64% of conflict cases, the model's preference shifted to favor the audio when text was removed, indicating the audio information was present but suppressed. To address this, a new decoding rule called Gated Audio Counterfactual Logit Correction (GACL) was developed, which improves model faithfulness and can be applied without retraining. AI

IMPACT Highlights a critical flaw in current audio-language models, potentially impacting their reliability in real-world applications and guiding future research.

Audio-Language Models
Gated Audio Counterfactual Logit Correction