PulseAugur
EN
LIVE 13:11:47

Vision-language models reconstruct 3D scenes as editable Blender programs

Researchers have developed a new framework called Staged Executable Inverse Graphics (SEIG) that uses vision-language models to reconstruct 3D scenes from single images. This method generates editable Blender programs, allowing for manipulation of geometry, materials, and lighting without specialized 3D models or multi-view data. The staged reconstruction approach significantly enhances fidelity, enabling various downstream applications. AI

IMPACT Enables more intuitive 3D scene creation and editing from single images, potentially impacting content creation and simulation.

RANK_REASON The cluster contains a research paper detailing a new framework for inverse graphics using vision-language models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

    Pretrained vision-language models can reconstruct 3D scenes from single images as editable Blender programs through progressive refinement, demonstrating improved fidelity through staged reconstruction approaches.

  2. arXiv cs.CV TIER_1 English(EN) · Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor ·

    Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

    arXiv:2606.02580v1 Announce Type: new Abstract: Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-lang…

  3. arXiv cs.CV TIER_1 English(EN) · Hadar Averbuch-Elor ·

    Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

    Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable invers…