Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 12h

A Benchmark for Omni-Modal Reasoning in Long Videos

Researchers have introduced LongShOTBench, a new benchmark designed to evaluate omni-modal reasoning capabilities in long videos. This benchmark integrates vision, speech, and ambient audio, offering detailed rubrics for diagnostic evaluation. Alongside the benchmark, they developed LongShOTAgent, a training-free agent that demonstrates strong performance on the new testbed, outperforming current multi-modal large language models. AI

LongShOTBench
LongShOTAgent
Mohammed Irfan Kurpath