PulseAugur
实时 16:36:34

Vision-language models 将 3D 场景重建为可编辑的 Blender 程序

研究人员开发了一个名为 Staged Executable Inverse Graphics (SEIG) 的新框架,该框架使用视觉语言模型从单个图像重建 3D 场景。该方法生成可编辑的 Blender 程序,允许在没有专业 3D 模型或多视图数据的情况下操纵几何、材质和光照。分阶段重建方法显著提高了保真度,支持各种下游应用。 AI

影响 能够从单个图像更直观地创建和编辑 3D 场景,可能影响内容创建和模拟。

排序理由 该集群包含一篇详细介绍使用视觉语言模型进行逆向图形学新框架的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    在 Blender 中思考:带有视觉语言模型的阶段式可执行逆向图形

    Pretrained vision-language models can reconstruct 3D scenes from single images as editable Blender programs through progressive refinement, demonstrating improved fidelity through staged reconstruction approaches.

  2. arXiv cs.CV TIER_1 English(EN) · Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor ·

    Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

    arXiv:2606.02580v1 Announce Type: new Abstract: Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-lang…

  3. arXiv cs.CV TIER_1 English(EN) · Hadar Averbuch-Elor ·

    Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

    Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable invers…