Spectro-Temporal Interference Confounds Phase Encoding in Spatial Audio Foundation Models
A new paper published on arXiv explores the limitations of current spatial audio foundation models, finding that they often rely on spectro-temporal interference rather than precise phase encoding for localization tasks. Researchers developed a psychoacoustic benchmark using the binaural masking level difference (BMLD) to test nine different audio models. While dedicated binaural spatial models showed comparable BMLD to analytical baselines, general-purpose binaural models demonstrated a reliance on interference textures, indicating a potential confounding factor in their performance metrics. AI