Vision-language models reconstruct 3D scenes as editable Blender programs

By PulseAugur Editorial · [3 sources] · 2026-06-01 00:00

Researchers have developed a new framework called Staged Executable Inverse Graphics (SEIG) that uses vision-language models to reconstruct 3D scenes from single images. This method generates editable Blender programs, allowing for manipulation of geometry, materials, and lighting without specialized 3D models or multi-view data. The staged reconstruction approach significantly enhances fidelity, enabling various downstream applications. AI

IMPACT Enables more intuitive 3D scene creation and editing from single images, potentially impacting content creation and simulation.

RANK_REASON The cluster contains a research paper detailing a new framework for inverse graphics using vision-language models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-01 00:00

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Pretrained vision-language models can reconstruct 3D scenes from single images as editable Blender programs through progressive refinement, demonstrating improved fidelity through staged reconstruction approaches.
arXiv cs.CV TIER_1 English(EN) · Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor · 2026-06-02 04:00

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

arXiv:2606.02580v1 Announce Type: new Abstract: Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-lang…
arXiv cs.CV TIER_1 English(EN) · Hadar Averbuch-Elor · 2026-06-01 17:59

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable invers…

COVERAGE [3]

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

RELATED ENTITIES

RELATED TOPICS