PulseAugur
实时 14:20:59

AI模型通过推理解决零样本视频检索问题

研究人员开发了用于零样本组合视频检索的新框架,该任务涉及根据参考视频和文本修改指令查找目标视频。这些方法在CVPR 2026 VidLLMs研讨会上提出,利用冻结的基础模型来推理隐含的更改并对潜在候选者进行重新排序。一种方法R3-CoVR通过使用多模态LLM生成后编辑描述和约束感知重新排序器,实现了高精度;而另一种方法R^3则侧重于推理引导的召回和重新排序。 AI

影响 引入了利用LLM进行推理的视频检索新方法,有望提高搜索的准确性和灵活性。

排序理由 多篇研究论文提出了针对特定AI任务的新颖框架。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.LG TIER_1 English(EN) · Ali Alavi ·

    Reason, Retrieve, Re-rank: A Zero-Shot Reasoning-Aware Framework for Composed Video Retrieval

    arXiv:2606.00910v1 Announce Type: cross Abstract: Composed Video Retrieval (CoVR) seeks the target video that results from applying a free-form textual modification to a reference video. We address the \emph{Reason-Aware} CoVR (CoVR-R) challenge at the CVPR~2026 VidLLMs workshop,…

  2. arXiv cs.CV TIER_1 English(EN) · Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Weili Guan, Liqiang Nie ·

    R^3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking

    arXiv:2606.01113v1 Announce Type: new Abstract: The CoVR-R challenge evaluates composed video retrieval, where a system must retrieve a target video from a large gallery given a reference video and a textual edit instruction. This setting is not a standard video-text retrieval pr…

  3. arXiv cs.CV TIER_1 English(EN) · Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang ·

    Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

    arXiv:2606.02321v1 Announce Type: new Abstract: Recent advances in large vision-language models have expanded video retrieval from simple text-based search to more flexible scenarios, where users may specify the desired result through both visual examples and textual instructions…

  4. arXiv cs.CV TIER_1 English(EN) · Qingming Huang ·

    Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

    Recent advances in large vision-language models have expanded video retrieval from simple text-based search to more flexible scenarios, where users may specify the desired result through both visual examples and textual instructions. In the CVPR 2026 Reason-Aware Composed Video R…