English(EN) From Priors to Perception: Grounding Video-LLMs in Physical Reality

新研究通过对抗性课程将视频大模型 grounding 到物理现实

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-06 05:48

一篇新的研究论文提出了统一归因理论，认为视频大模型在物理推理方面的困难源于“语义先验主导”，而非感知问题。为解决此问题，该论文提出了程序化对抗性课程（PACC）数据集和视觉锚定推理链（VARC）方法。实验表明，使用 PACC 进行微调，可以在不改变架构的情况下显著提高最先进模型的物理推理能力。 AI

影响引入了一个新的数据集和方法来改进视频大模型的物理推理能力，有可能增强其在现实世界中的应用。

排序理由学术论文，详细介绍了一种新的理论和数据集，用于改进视频大模型的物理推理能力。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin · 2026-05-07 04:00

From Priors to Perception: Grounding Video-LLMs in Physical Reality

arXiv:2605.04515v1 Announce Type: new Abstract: While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally…
arXiv cs.CV TIER_1 English(EN) · Weiyao Lin · 2026-05-06 05:48

From Priors to Perception: Grounding Video-LLMs in Physical Reality

While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine phys…

报道来源 [2]

From Priors to Perception: Grounding Video-LLMs in Physical Reality

From Priors to Perception: Grounding Video-LLMs in Physical Reality

相关实体

相关话题