New research grounds Video-LLMs in physical reality with adversarial curriculum

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-06 05:48

A new research paper introduces the Unified Attribution Theory, suggesting that Video-LLMs' struggles with physical reasoning stem from "Semantic Prior Dominance" rather than perceptual issues. To address this, the paper proposes the Programmatic Adversarial Curriculum (PACC) dataset and the Visual-Anchored Reasoning Chain (VARC) method. Experiments show that fine-tuning with PACC significantly improves physical reasoning in state-of-the-art models without architectural changes. AI

影响 Introduces a novel dataset and method to improve physical reasoning in Video-LLMs, potentially enhancing their real-world applicability.

排序理由 Academic paper detailing a new theory and dataset for improving Video-LLM physical reasoning.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin · 2026-05-07 04:00

From Priors to Perception: Grounding Video-LLMs in Physical Reality

arXiv:2605.04515v1 Announce Type: new Abstract: While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally…
arXiv cs.CV TIER_1 English(EN) · Weiyao Lin · 2026-05-06 05:48

From Priors to Perception: Grounding Video-LLMs in Physical Reality

While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine phys…

报道来源 [2]

From Priors to Perception: Grounding Video-LLMs in Physical Reality

From Priors to Perception: Grounding Video-LLMs in Physical Reality

相关实体

相关话题