Researchers have developed a novel extension of Shapley Values to explain the behavior of multimodal multilingual models (MLLMs). This framework addresses the challenges of integrating text and audio data by treating them as cooperative features and employing efficient estimation strategies for computational feasibility. The approach includes a new preprocessing method, Spectrogram-Guided Phonetic Alignment (SGPA), to align audio segments with text, and provides an open-source package with a GUI for visualization. Experiments on datasets like VoiceBench and Infinity Instruct show that input modality significantly impacts attributions, and standard importance proxies are insufficient for multimodal, cross-lingual contexts. AI
IMPACT Provides a new method for understanding and potentially debugging complex multimodal AI systems.
RANK_REASON This is a research paper detailing a new methodology for explaining AI models. [lever_c_demoted from research: ic=1 ai=1.0]
- Infinity Instruct
- Multimodal Large Language Models
- Shapley Values
- Spectrogram-Guided Phonetic Alignment
- VoiceBench
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →