Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 1mo

SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models

Researchers have developed SceneGraphVLM, a novel method for generating dynamic scene graphs from videos using compact vision-language models. This approach serializes graphs into an efficient TOON format and employs a two-stage training process, including reinforcement learning with specialized rewards to improve precision and reduce irrelevant objects. SceneGraphVLM offers a strong quality-speed trade-off, achieving near real-time performance with vLLM acceleration and providing lightweight temporal context for video analysis. AI

IMPACT Introduces a more efficient method for structured visual perception from video, potentially improving downstream AI tasks that rely on understanding scene context.

SceneGraphVLM
Vladislav Makarov