Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Researchers have introduced OVO-S-Bench, a new benchmark designed to evaluate the spatial intelligence of multimodal large language models (MLLMs) in streaming environments. This benchmark features 1,680 questions across 348 videos, with a focus on continuous egocentric streams relevant to robotics and autonomous driving. Initial evaluations show that Gemini-3.1-Pro lags significantly behind human experts, particularly in allocentric mapping tasks, and surprisingly, specialized streaming MLLMs underperform their base models. AI

IMPACT Establishes a new, demanding testbed for streaming spatial MLLMs, highlighting current limitations and guiding future development.

Gemini-3.1-Pro
MLLMs
OVO-S-Bench