A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

arXiv:2606.18451v1 Announce Type: new Abstract: Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP similarity and mesh geometry-validity statistics), yet how well these track perceived quality is unestablished. We make two contributions. First, we propose and validate a reproducible VLM-judge evaluation protocol: a fixed 24-view headless render rig, two independent vision-language judge families, and a mandatory position-b
The rapid advancement of single-image-to-3D generators necessitates better evaluation methods, as current proxies are proving insufficient to assess true quality.
Establishing a robust, human-free evaluation protocol for 3D mesh quality is crucial for accelerating development and ensuring reliable benchmarks in the burgeoning 3D generation space.
The proposed VLM-judge protocol provides a standardized, reproducible method for comparing 3D mesh outputs, moving beyond subjective human assessments and unreliable 'cheap proxies'.
- · 3D content creators
- · AI researchers in 3D generation
- · Generative AI platforms
- · Robotics and simulation industries
- · Developers relying solely on 'cheap proxies'
- · Companies with inferior 3D generation models
Improved 3D generative models due to clearer performance feedback.
Faster adoption of 3D generation in various industries, from e-commerce to gaming and industrial design.
Enhanced realism and fidelity in virtual and augmented reality applications, blurring lines between digital and physical.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG