Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 8h

Conditional Multi-Event Temporal Grounding in Long-Form Video

Researchers have introduced CoMET-Bench, a new benchmark designed for Conditional Multi-Event Temporal Grounding in long-form videos. Existing benchmarks are insufficient as they typically localize only a single event or treat grounding and counting as separate tasks. CoMET-Bench includes a large dataset with complex queries and proposes a unified evaluation protocol with a new Rejection-F1 metric to address limitations in current methods. A proposed agentic framework, CoMET-Agent, demonstrated improved performance over GPT-5 by reformulating the task as structured search and aggregation. AI

Hugging Face
GPT-5
arXiv
DagsHub
alphaXiv
CORE Recommender
ScienceCast
CatalyzeX
Connected Papers
Litmaps
scite Smart Citations
Gotit.pub
CoMET-Bench
CoMET-Agent