A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)
Researchers have developed a new protocol called VLM-Judge to reliably evaluate the quality of 3D meshes generated from single images. This protocol uses a fixed rendering setup and multiple vision-language model judges, achieving substantial agreement between judges. The study found that common automatic proxies like render-space CLIP similarity and mesh geometry-validity statistics do not accurately track perceived quality and can be misleading. AI
IMPACT Establishes a more reliable benchmark for evaluating single-image 3D mesh generation, potentially guiding future model development.