CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering
Researchers have developed CRAFT, a novel pipeline designed for multimodal video question answering that focuses on accurately identifying and verifying claims within news archives. This system dynamically selects keyframes, utilizes automatic speech recognition with multilingual support, and employs an iterative critic loop to refine and correct claims. CRAFT demonstrated superior performance on the MAGMaR 2026 benchmark, achieving the highest scores in overall average, reference recall, and citation F1. AI
IMPACT Introduces a new method for grounding claims in video evidence, potentially improving the reliability of AI-driven video analysis and summarization.