FurnitureVLA: Learning Long-Horizon Bimanual Furniture Assembly with Vision-Language-Action Model

arXiv:2607.01212v1 Announce Type: cross Abstract: Current work on robot furniture assembly mostly focuses on toy-scale settings or single-arm manipulation. We introduce FurnitureVLA, the first systematic study of real-scale bimanual furniture assembly using Vision-Language-Action models (VLAs). We formalize the task, develop a scalable simulation pipeline for expert data generation and evaluation, and build a VR teleoperation system for single-operator bimanual control to collect high-quality real-world demonstrations. To address extreme long-horizon assembly with up to 7 subtasks and 1550 con
The recent advancements in Vision-Language Models (VLMs) and increasing computational capabilities are enabling more complex robotic manipulation tasks previously considered infeasible.
This research demonstrates a significant leap towards autonomous bimanual robot assembly in real-world, large-scale settings, directly impacting future automation in manufacturing and logistics.
Robots are moving beyond toy-scale, single-arm tasks to complex, real-scale bimanual operations for long-horizon assembly, drastically expanding their potential applications in unstructured environments.
- · Robotics manufacturers
- · Automation industry
- · Logistics and supply chain
- · Furniture manufacturing
- · Manual assembly labor
- · Companies reliant on low-efficiency assembly
Increased efficiency and reduced labor costs in specific assembly tasks.
Accelerated development of general-purpose bimanual robots for industrial and potentially domestic use.
Broader adoption of AI-driven automation leading to significant shifts in workforce demands and economic structures.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI