3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code
Researchers have introduced 3DCodeBench, a new benchmark designed to evaluate vision-language models (VLMs) in their ability to generate procedural 3D models through code. The benchmark includes a dataset of multimodal prompts and corresponding procedural code, alongside a human preference ranking platform called 3DCodeArena. Evaluations revealed that VLMs often struggle with API mismatches and geometric inconsistencies, though performance improves with increased reasoning and refinement. AI
IMPACT This benchmark could accelerate the development of AI agents capable of complex 3D content creation, impacting game development and virtual environments.