TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living
Researchers have developed TimeProVe, a novel framework designed to improve the efficiency of temporal reasoning in long videos, particularly for activities of daily living. This approach uses lightweight modules to propose potential answer-evidence hypotheses before engaging a more computationally expensive vision-language model (VLM) for targeted verification. To evaluate its effectiveness, the team also introduced OpenTSUBench (OTB), a new benchmark for assessing temporal reasoning in real-world scenarios. Experiments demonstrated that TimeProVe significantly reduces VLM calls and inference costs while achieving state-of-the-art results on OTB and competitive performance on other benchmarks like Charades-STA. AI
IMPACT This framework could significantly reduce the computational cost of analyzing long videos, making advanced temporal reasoning more accessible for various applications.