AI模型通过推理解决零样本视频检索问题

作者 PulseAugur 编辑部 · [4 个来源] · 2026-06-01 14:35

研究人员开发了用于零样本组合视频检索的新框架，该任务涉及根据参考视频和文本修改指令查找目标视频。这些方法在CVPR 2026 VidLLMs研讨会上提出，利用冻结的基础模型来推理隐含的更改并对潜在候选者进行重新排序。一种方法R3-CoVR通过使用多模态LLM生成后编辑描述和约束感知重新排序器，实现了高精度；而另一种方法R^3则侧重于推理引导的召回和重新排序。 AI

影响引入了利用LLM进行推理的视频检索新方法，有望提高搜索的准确性和灵活性。

排序理由多篇研究论文提出了针对特定AI任务的新颖框架。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.LG TIER_1 English(EN) · Ali Alavi · 2026-06-02 04:00

Reason, Retrieve, Re-rank：一种零样本推理感知型框架，用于组合视频检索

arXiv:2606.00910v1 Announce Type: cross Abstract: Composed Video Retrieval (CoVR) seeks the target video that results from applying a free-form textual modification to a reference video. We address the \emph{Reason-Aware} CoVR (CoVR-R) challenge at the CVPR~2026 VidLLMs workshop,…
arXiv cs.CV TIER_1 English(EN) · Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Weili Guan, Liqiang Nie · 2026-06-02 04:00

R^3：通过推理引导的召回和重排进行组合视频检索

arXiv:2606.01113v1 Announce Type: new Abstract: The CoVR-R challenge evaluates composed video retrieval, where a system must retrieve a target video from a large gallery given a reference video and a textual edit instruction. This setting is not a standard video-text retrieval pr…
arXiv cs.CV TIER_1 English(EN) · Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang · 2026-06-02 04:00

通过视觉表征引导的视频-LLM推理进行无训练的组合视频检索

arXiv:2606.02321v1 Announce Type: new Abstract: Recent advances in large vision-language models have expanded video retrieval from simple text-based search to more flexible scenarios, where users may specify the desired result through both visual examples and textual instructions…
arXiv cs.CV TIER_1 English(EN) · Qingming Huang · 2026-06-01 14:35

通过视觉表征引导的视频-LLM推理实现无训练的组合视频检索

Recent advances in large vision-language models have expanded video retrieval from simple text-based search to more flexible scenarios, where users may specify the desired result through both visual examples and textual instructions. In the CVPR 2026 Reason-Aware Composed Video R…

报道来源 [4]

Reason, Retrieve, Re-rank：一种零样本推理感知型框架，用于组合视频检索

R^3：通过推理引导的召回和重排进行组合视频检索

通过视觉表征引导的视频-LLM推理进行无训练的组合视频检索

通过视觉表征引导的视频-LLM推理实现无训练的组合视频检索

相关实体

相关话题