TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting

arXiv:2605.26576v1 Announce Type: cross Abstract: Referring 3D Gaussian Splatting (R3DGS), which utilizes natural language for 3D object segmentation, has emerged as a crucial capability for embodied AI. However, existing methods typically rely on expensive per-scene manual annotation and per-view pseudo mask generation, which suffer from multi-view inconsistency and poor generalization to varying query specificities. To address this, we present TrackRef3D, a fully automatic pipeline that achieves open-world referring segmentation in 3D Gaussian Splatting (3DGS) without manual annotation by in
The increased maturity of 3D Gaussian Splatting and the rapid development in embodied AI are converging to make robust 3D object segmentation a critical bottleneck.
This breakthrough addresses a significant challenge in open-world 3D scene understanding for AI, reducing reliance on manual data annotation and improving generalization.
The ability to perform open-world referring segmentation in 3D without extensive manual annotation lowers the barrier for developing more capable embodied AI systems.
- · Embodied AI developers
- · Robotics companies
- · Generative AI platforms
- · Computer vision researchers
- · Companies relying on manual 3D data annotation for segmentation
Embodied AI systems will gain improved perception and interaction capabilities in complex, real-world environments.
The proliferation of more sophisticated embodied AI could accelerate automation in logistics, manufacturing, and service industries.
As embodied AI becomes more capable and ubiquitous, ethical and safety concerns regarding autonomous agents will intensify, potentially leading to new regulatory frameworks.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG