EvoGround uses self-evolving agents for video temporal grounding

By PulseAugur Editorial · [1 sources] · 2026-05-13 17:25

Researchers have developed EvoGround, a novel framework utilizing two self-evolving agents to perform video temporal grounding without human-labeled data. The system comprises a proposer agent that generates query-moment pairs from raw videos and a solver agent that grounds these pairs, providing feedback to enhance the proposer. This self-reinforcing loop allows the agents to mutually improve, achieving state-of-the-art results on VTG benchmarks and even functioning as a fine-grained video captioner. AI

IMPACT Introduces a novel method for video analysis that bypasses the need for extensive manual annotation, potentially accelerating research and application development in video understanding.

RANK_REASON The cluster contains a research paper detailing a new method for video temporal grounding. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

EvoGround uses self-evolving agents for video temporal grounding

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Lorenzo Torresani · 2026-05-13 17:25

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding

Video temporal grounding (VTG) takes an untrimmed video and a natural-language query as input and localizes the temporal moment that best matches the query. Existing methods rely on large, task-specific datasets requiring costly manual annotation. We introduce EvoGround, a framew…

COVERAGE [1]

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding

RELATED ENTITIES

RELATED TOPICS