VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
Apple researchers have introduced VSAS-Bench, a new framework designed to evaluate visual streaming assistant models in real-time. Unlike previous offline evaluation methods, VSAS-Bench incorporates metrics for proactiveness and consistency, crucial for streaming VLMs. The benchmark includes over 18,000 temporally dense annotations across various domains and task types, along with standardized evaluation protocols and metrics to isolate specific streaming VLM capabilities. Their evaluations showed that adapted conventional VLMs can outperform specialized streaming models, with Qwen3-VL-4B achieving a 3% lead over the top-performing streaming VLM on their benchmark. AI
IMPACT Introduces a new benchmark for evaluating real-time visual streaming assistants, potentially driving improvements in their proactiveness and consistency.