PulseAugur
实时 12:01:45
English(EN) HAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration

HAT-4D框架从单视频中重建3D物体交互

研究人员推出HAT-4D,一个新颖的代理框架,旨在从单个单目视频中重建多个物体的3D几何、时间动态和物理交互。该方法集成了视觉语言模型(VLM)和人类在回路反馈机制,以克服多物体场景中的深度歧义和遮挡等挑战。HAT-4D旨在作为具身AI和训练VLA的可扩展数据引擎,并已用于创建MVOIK-4D,一个用于单目4D交互重建的新基准。 AI

影响 通过从单个视频中重建复杂的物体交互,从而能够更有效地为具身AI和VLA训练收集数据。

排序理由 该集群描述了一篇详细介绍用于视频4D重建的新颖框架和基准的研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

HAT-4D框架从单视频中重建3D物体交互

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Jiaxin Li, Yuxiang Wu, Zhenkai Zhang, Xinrui Shi, Haoyuan Wang, Yichen Zhao, Su Linxiang, Chenyang Yu, Mingyu Zhang, Yifan Ding, Boran Wen, Li Zhang, Ruiyang Liu, Yong-Lu Li ·

    HAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration

    arXiv:2606.28215v1 Announce Type: cross Abstract: Extracting dynamic 4D object interactions from massive, in-the-wild monocular videos offers a highly efficient data collection pathway for scaling Embodied AI and training VLAs. However, existing monocular 4D reconstruction method…

  2. arXiv cs.AI TIER_1 English(EN) · Yong-Lu Li ·

    HAT-4D:通过人机协作提升单目视频的4D多目标交互能力

    Extracting dynamic 4D object interactions from massive, in-the-wild monocular videos offers a highly efficient data collection pathway for scaling Embodied AI and training VLAs. However, existing monocular 4D reconstruction methods primarily focus on isolated objects, often faili…