Researchers have developed ORCA, a novel model-based approach for assessing the correctness of open-ended responses from large audio language models (LALMs). This system utilizes a three-stage annotation pipeline involving human judgment, structured feedback, and human-AI correction to generate a dataset of over 9,600 annotations. ORCA models have demonstrated strong performance, achieving a Spearman correlation of 0.91 with human correctness ratings on known benchmarks and generalizing to new benchmarks with a score of 0.85, outperforming models like Gemini 2.5 Flash. AI
IMPACT This new assessment method could accelerate the development and reliability of audio-based AI models by providing more accurate evaluation metrics.
RANK_REASON The cluster describes a new research paper detailing a novel method for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]
- arXiv
- Audio Question Answering
- DagsHub
- Gemini 2.5 Flash
- Hugging Face
- ORCA
- Santosh Kesiraju
- ScienceCast
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →