TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation
Researchers have developed TRACE, a new framework designed to improve multi-video event understanding and claim generation. TRACE employs a ground-before-reasoning strategy, first creating text-searchable timelines for each video using OCR and object detection. A text-only LLM then localizes relevant evidence before visual reasoning begins, enhancing factual completeness and attribution fidelity. Experiments show TRACE significantly outperforms baseline models on benchmarks like MAGMaR 2026, achieving state-of-the-art results. AI
IMPACT Enhances AI's ability to process and reason over multiple video sources, improving factual accuracy and citation.