音频语言模型常在没有音频的情况下回答问题，挑战了评估方法。

作者 PulseAugur 编辑部 · [4 个来源] · 2026-04-27 12:25

新研究表明，大型音频语言模型（LALMs）可能不具备真正的听觉感知能力，尽管它们在基准测试中得分很高。研究显示，这些模型仅凭文本和通用知识就能回答问题，在没有音频输入的情况下仍能保持相当一部分性能。此外，当需要音频时，模型通常只需要片段而非完整音频剪辑，这挑战了当前评估方法在衡量稳健音频理解方面的可靠性。 AI

影响挑战了当前音频语言模型的评估指标，表明需要更稳健的基准设计来准确衡量听觉理解能力。

排序理由该集群包含两篇在arXiv上发表的关于大型音频语言模型评估的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.CL TIER_1 English(EN) · Leonardo Haw-Yang Foo, Chih-Kai Yang, Chen-An Li, Ke-Han Lu, Hung-yi Lee · 2026-04-28 04:00

闪光皆非声音：重新思考音频-语言评估中的文本先验和音频依赖

arXiv:2604.24401v1 Announce Type: cross Abstract: Large Audio-Language Models show consistent performance gains across speech and audio benchmarks, yet high scores may not reflect true auditory perception. If a model can answer questions without processing the acoustic signal, th…
arXiv cs.CL TIER_1 English(EN) · Chen-An Li, Tzu-Han Lin, Hung-yi Lee · 2026-04-28 04:00

沉默的重要性：无关音频对大型音频语言模型文本推理的影响

arXiv:2510.00626v3 Announce Type: replace-cross Abstract: Large audio-language models (LALMs) unify speech and text processing, but their robustness in noisy real-world settings remains underexplored. We investigate how irrelevant audio, such as silence, synthetic noise, and envi…
arXiv cs.CL TIER_1 English(EN) · Hung-yi Lee · 2026-04-27 12:25

闪光皆非声音：重新思考音频-语言评估中的文本先验和对声音的依赖

Large Audio-Language Models show consistent performance gains across speech and audio benchmarks, yet high scores may not reflect true auditory perception. If a model can answer questions without processing the acoustic signal, the benchmark fails as a measure of auditory underst…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-27 12:25

闪光皆非声音：重新思考音频-语言评估中的文本先验和音频依赖

Large Audio-Language Models show consistent performance gains across speech and audio benchmarks, yet high scores may not reflect true auditory perception. If a model can answer questions without processing the acoustic signal, the benchmark fails as a measure of auditory underst…

报道来源 [4]

闪光皆非声音：重新思考音频-语言评估中的文本先验和音频依赖

沉默的重要性：无关音频对大型音频语言模型文本推理的影响

闪光皆非声音：重新思考音频-语言评估中的文本先验和对声音的依赖

闪光皆非声音：重新思考音频-语言评估中的文本先验和音频依赖

相关实体

相关话题