PulseAugur
实时 11:47:19
English(EN) Static and Dynamic Graph Alignment Network for Temporal Video Grounding

新的人工智能方法利用多模态大语言模型和图网络增强视频时序定位

研究人员为时序视频定位(TVG)任务开发了两个新框架,该任务专注于根据文本查询在视频中定位特定时刻。MASRA框架在训练期间利用多模态大语言模型(MLLM)生成文本先验,增强语义和关系对齐以提高时序一致性。同时,SDGAN框架采用图卷积网络(GCN)对时序关系进行建模,结合静态和动态视觉特征,并纳入查询感知学习以实现更精确的定位。 AI

影响 这些新框架提供了改进的视频内容与文本查询对齐的方法,有可能增强人工智能理解和索引视频数据的能力。

排序理由 该集群包含两篇详细介绍时序视频定位新方法的学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新的人工智能方法利用多模态大语言模型和图网络增强视频时序定位

报道来源 [4]

  1. arXiv cs.CV TIER_1 English(EN) · Ran Ran, Jiwei Wei, Shuchang Zhou, Yitong Qin, Shiyuan He, Zeyu Ma, Yuyang Zhou, Yang Yang ·

    MASRA: MLLM-Assisted Semantic-Relational Consistent Alignment for Video Temporal Grounding

    arXiv:2605.03398v1 Announce Type: new Abstract: Video Temporal Grounding (VTG) faces a cross-modal semantic gap that often leads to background features being incorrectly aligned with the query, while directly matching the query to moments results in insufficient discriminability …

  2. arXiv cs.CV TIER_1 English(EN) · Yang Yang ·

    MASRA: MLLM-Assisted Semantic-Relational Consistent Alignment for Video Temporal Grounding

    Video Temporal Grounding (VTG) faces a cross-modal semantic gap that often leads to background features being incorrectly aligned with the query, while directly matching the query to moments results in insufficient discriminability and consistency of temporal semantics. To addres…

  3. arXiv cs.CV TIER_1 English(EN) · Zhanjie Hu, Bolin Zhang, Jianhua Wang, Jianbo Zheng, Chenchen Yan, Takahiro Komamizu, Ichiro Ide, Jiangbo Qian ·

    Static and Dynamic Graph Alignment Network for Temporal Video Grounding

    arXiv:2605.00684v1 Announce Type: new Abstract: Temporal Video Grounding (TVG) aims to localize temporal moments in an untrimmed video that semantically correspond to given natural language queries. Recently, Graph Convolutional Networks (GCN) have been widely adopted in TVG to m…

  4. arXiv cs.CV TIER_1 English(EN) · Jiangbo Qian ·

    Static and Dynamic Graph Alignment Network for Temporal Video Grounding

    Temporal Video Grounding (TVG) aims to localize temporal moments in an untrimmed video that semantically correspond to given natural language queries. Recently, Graph Convolutional Networks (GCN) have been widely adopted in TVG to model temporal relations among video clips and en…