Researchers have developed a new framework to evaluate audio hallucinations in egocentric videos, where models infer sounds from visual cues that are not actually heard. Their study found that advanced audio-visual language models (AV-LLMs) like Qwen2.5 Omni exhibit significant hallucination rates. The team curated a dataset of 300 videos and created 1,000 sound-focused questions to probe model outputs, categorizing hallucinations into foreground action sounds and background ambient sounds. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights the need for robust evaluation of hallucinations in AV-LLMs to improve their reliability.
RANK_REASON The cluster contains an academic paper detailing a new evaluation framework for audio hallucinations in AV-LLMs.