English(EN) ORCA: Open-ended Response Correctness Assessment for Audio Question Answering

新的ORCA系统可准确评估音频LLM响应

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

研究人员开发了ORCA，一种用于评估大型音频语言模型（LALM）开放式响应正确性的新型基于模型的方法。该系统采用了一个三阶段的标注流程，包括人工判断、结构化反馈和人机协同纠错，生成了超过9600个标注的数据集。ORCA模型表现强劲，在已知基准测试上与人类正确性评分的Spearman相关性达到0.91，并在新基准测试上泛化能力得分为0.85，优于Gemini 2.5 Flash等模型。 AI

影响这种新的评估方法通过提供更准确的评估指标，有望加速基于音频的AI模型的开发和可靠性。

排序理由该集群描述了一篇详细介绍AI模型新评估方法的最新研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · \v{S}imon Sedl\'a\v{c}ek, Sara Barahona, Bolaji Yusuf, Laura Herrera-Alarc\'on, Santosh Kesiraju, Cecilia Bola\~nos, Alicia Lozano-Diez, Sathvik Udupa, Fernando L\'opez, Allison Ferner, Ramani Duraiswami, Jan \v{C}ernock\'y · 2026-06-30 04:00

ORCA：音频问答的开放式响应正确性评估

arXiv:2512.09066v2 Announce Type: replace-cross Abstract: Reliable assessment of the abilities of large audio language models (LALMs) is essential to advancing the state of the art. As benchmarks rapidly evolve to incorporate complex reasoning and subjective tasks, they increasin…

报道来源 [1]

ORCA：音频问答的开放式响应正确性评估

相关实体

相关话题