English(EN) Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition

新框架解码视听语音识别中的模态贡献

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-09 04:00

研究人员开发了 Dr. SHAP-AV，一个利用 Shapley 值分析视听语音识别模型如何平衡声学和视觉信息的框架。在六个模型和不同噪声水平下的实验表明，虽然模型在嘈杂条件下会增加视觉依赖性，但音频贡献仍然很重要。分析还揭示了模态平衡在语音生成过程中会发生变化，并且信噪比是模态加权的主要驱动因素，这表明当前模型存在持续的音频偏见。 AI

影响提供了一个诊断工具，用于理解和潜在地提高视听人工智能系统的鲁棒性。

排序理由学术论文，详细介绍了一个分析模型行为的新框架。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Umberto Cappellazzo, Stavros Petridis, Maja Pantic · 2026-06-09 04:00

Dr. SHAP-AV：通过 Shapley 归因解码视听语音识别中的相对模态贡献

arXiv:2603.12046v2 Announce Type: replace-cross Abstract: Audio-Visual Speech Recognition (AVSR) leverages both acoustic and visual information for robust recognition under noise. However, how models balance these modalities remains unclear. We present Dr. SHAP-AV, a framework us…

报道来源 [1]

Dr. SHAP-AV：通过 Shapley 归因解码视听语音识别中的相对模态贡献

相关实体

相关话题