Harnessing Streaming Video in the Wild
Researchers have developed a new framework called Streaming Harness to enable Vision-Language Models (VLMs) to process unbounded video streams in real-time. This system enhances VLMs with proactive interaction, long-term memory retention up to 12 hours, and sub-second processing latency. To support this advancement, they also introduced a new streaming dataset, Streaming-Train-248K, and a benchmark, Streaming-Eval, to drive further progress in deployable streaming intelligence. AI
IMPACT Enables real-time analysis of live video feeds for applications like assistants and robotics, moving beyond offline video understanding.