English(EN) Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs

MLLMs利用内在不确定性提高视觉任务性能

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-29 04:00

研究人员开发了一个新颖的无训练框架，该框架利用多模态大语言模型（MLLMs）的内在不确定性来增强其在复杂视觉任务上的性能。核心思想是，当MLLM接收到相关的视觉信息时，其不确定性会降低，从而使其能够专注于信息量最大的数据。这种方法已成功应用于视觉搜索、长视频理解和时间定位，在无需特定任务训练的情况下，取得了与专门的、微调的系统相媲美的结果。 AI

影响这种方法可以为多模态人工智能系统带来更高效、更具泛化性的细粒度感知能力。

排序理由该集群包含一篇详细介绍MLLM新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Sanghwan Kim, Rui Xiao, Stephan Alaniz, Yongqin Xian, Zeynep Akata · 2026-06-29 04:00

Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs

arXiv:2510.00705v3 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) often struggle with fine-grained perception, such as identifying small objects in high-resolution images or detecting key moments in long videos. Existing methods typically rely on comple…

报道来源 [1]

Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs

相关实体

相关话题