PulseAugur
实时 15:55:37

音频语言模型常在没有音频的情况下回答问题,挑战了评估方法。

新研究表明,大型音频语言模型(LALMs)可能不具备真正的听觉感知能力,尽管它们在基准测试中得分很高。研究显示,这些模型仅凭文本和通用知识就能回答问题,在没有音频输入的情况下仍能保持相当一部分性能。此外,当需要音频时,模型通常只需要片段而非完整音频剪辑,这挑战了当前评估方法在衡量稳健音频理解方面的可靠性。 AI

影响 挑战了当前音频语言模型的评估指标,表明需要更稳健的基准设计来准确衡量听觉理解能力。

排序理由 该集群包含两篇在arXiv上发表的关于大型音频语言模型评估的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

音频语言模型常在没有音频的情况下回答问题,挑战了评估方法。

报道来源 [4]

  1. arXiv cs.CL TIER_1 English(EN) · Leonardo Haw-Yang Foo, Chih-Kai Yang, Chen-An Li, Ke-Han Lu, Hung-yi Lee ·

    All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation

    arXiv:2604.24401v1 Announce Type: cross Abstract: Large Audio-Language Models show consistent performance gains across speech and audio benchmarks, yet high scores may not reflect true auditory perception. If a model can answer questions without processing the acoustic signal, th…

  2. arXiv cs.CL TIER_1 English(EN) · Chen-An Li, Tzu-Han Lin, Hung-yi Lee ·

    When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models

    arXiv:2510.00626v3 Announce Type: replace-cross Abstract: Large audio-language models (LALMs) unify speech and text processing, but their robustness in noisy real-world settings remains underexplored. We investigate how irrelevant audio, such as silence, synthetic noise, and envi…

  3. arXiv cs.CL TIER_1 English(EN) · Hung-yi Lee ·

    All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation

    Large Audio-Language Models show consistent performance gains across speech and audio benchmarks, yet high scores may not reflect true auditory perception. If a model can answer questions without processing the acoustic signal, the benchmark fails as a measure of auditory underst…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation

    Large Audio-Language Models show consistent performance gains across speech and audio benchmarks, yet high scores may not reflect true auditory perception. If a model can answer questions without processing the acoustic signal, the benchmark fails as a measure of auditory underst…