A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs
Researchers have developed a new benchmark to systematically evaluate positional bias in multimodal large language models (MLLMs) when summarizing multiple videos. Their findings indicate that the quality of summaries can be affected by the order in which videos are presented to the model, a bias that varies across different domains and models. The study also explored prompt-based mitigation techniques, concluding that current multi-video summarization systems are still susceptible to input order, highlighting the need for more robust, order-invariant multimodal systems. AI
IMPACT Highlights a critical limitation in current multimodal LLMs, pushing for development of more robust and order-invariant systems for video understanding tasks.