PulseAugur
EN
LIVE 09:42:07

New benchmark CoMET-Bench tackles multi-event video grounding

Researchers have introduced CoMET-Bench, a new benchmark designed for Conditional Multi-Event Temporal Grounding in long-form videos. Existing benchmarks are insufficient as they typically localize only a single event or treat grounding and counting as separate tasks. CoMET-Bench includes a large dataset with complex queries and proposes a unified evaluation protocol with a new Rejection-F1 metric to address limitations in current methods. A proposed agentic framework, CoMET-Agent, demonstrated improved performance over GPT-5 by reformulating the task as structured search and aggregation. AI

RANK_REASON The cluster contains a research paper introducing a new benchmark and methodology for video temporal grounding. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CV TIER_1 English(EN) · Yuanhao Zou, Arthad Kulkarni, Lucas Tonanez, Lincoln Spencer, Guangyu Sun, Tianxingjian Ding, Andong Deng, Yi Li, Shuangjun Liu, Yuan Li, Dashan Gao, Ning Bi, Taotao Jing, Shuai Zhang, Chen Chen ·

    Conditional Multi-Event Temporal Grounding in Long-Form Video

    arXiv:2606.15320v1 Announce Type: new Abstract: Multimodal large language models have made rapid progress in video temporal grounding, yet real-world applications routinely require localizing every event that satisfies compositional temporal and spatial conditions. Existing bench…