English(EN) ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

新基准评估 LALM 在细微语音特征上的表现 · 已追踪 2 个来源

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-23 14:43

研究人员推出了 ParaPairAudioBench，这是一个旨在评估大型音频语言模型 (LALM) 在区分语音中细粒度副语言特征能力的新基准。该基准包含 5,175 个音频对，涵盖五个维度：风格、语速、强调、年龄和性别。实验表明，当前的 LALM 裁判在平均 32 个百分点上未能达到人类判断水平，并且存在严重的校准问题，尤其是在需要弃权的情况下。 AI

影响该基准有望推动 LALM 在更细致、更可靠的语音评估方面取得改进。

排序理由该集群描述了一个用于评估 AI 模型的新学术基准。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Jisu Jeon, Seungyeon Jwa, Joosung Lee, Jinhyeon Kim, Woojin Chung, Hwiyeol Jo, Jeonghoon Kim, Jonghyun Choi, Soyoon Kim · 2026-06-24 04:00

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

arXiv:2606.24648v1 Announce Type: cross Abstract: Large Audio-Language Models (LALMs) have been widely used as judge models for the automatic evaluation of generated speech. However, prior approaches predominantly focus on holistic naturalness, leaving fine-grained paralinguistic…
arXiv cs.CL TIER_1 English(EN) · Soyoon Kim · 2026-06-23 14:43

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

Large Audio-Language Models (LALMs) have been widely used as judge models for the automatic evaluation of generated speech. However, prior approaches predominantly focus on holistic naturalness, leaving fine-grained paralinguistic distinctions underexplored. We introduce ParaPair…

报道来源 [2]

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

ParaPairAudioBench: Paralinguistic Pairwise Audio Benchmark for LALM-as-a-Judge

相关实体

相关话题