Keep It in Mind: User Centric Continual Spatial Intelligence Reasoning in Egocentric Video Streams
Researchers have introduced UCS-Bench, a new dataset designed to evaluate user-centric continual spatial intelligence in egocentric video streams. The dataset includes over 170 hours of video and more than 8,000 questions focused on dynamic spatial reasoning and long-term memory relative to user location. To address this challenge, a framework called DirectMe was developed, which builds and maintains a structured spatial memory from streaming egocentric observations, improving the recall of object locations and enabling long-horizon queries. Experiments demonstrate that DirectMe significantly enhances the spatial reasoning capabilities of leading multimodal LLMs and outperforms existing spatially aware and long-form streaming video models. AI
IMPACT Enhances spatial reasoning in egocentric AI assistants by improving memory and location recall from video streams.