Researchers have developed Dr. SHAP-AV, a framework utilizing Shapley values to analyze how audio-visual speech recognition models balance acoustic and visual information. Experiments across six models and varying noise levels show that while models increase visual reliance in noisy conditions, audio contributions remain significant. The analysis also revealed that modality balance shifts during speech generation and that signal-to-noise ratio is the primary driver of modality weighting, indicating a persistent audio bias in current models. AI
IMPACT Provides a diagnostic tool to understand and potentially improve the robustness of audio-visual AI systems.
RANK_REASON Academic paper detailing a new framework for analyzing model behavior. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →