
arXiv:2512.20014v3 Announce Type: replace-cross Abstract: While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup," where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific object unseen during training using only a few reference images. To address this challenge, we propose Visual Attentive Prompting (VAP), a simple-yet-effective training-free perceptual adapter that e
The proliferation of advanced vision-language models and robotic platforms creates an immediate need for personalized interaction capabilities.
This development addresses a critical limitation in current VLA models, paving the way for more intuitive and effective human-robot collaboration in personalized environments.
VLA models can now be adapted to specific user preferences and objects without extensive retraining, democratizing personalized robotic assistance.
- · Robotics companies
- · AI software developers
- · Smart home device manufacturers
- · Elderly care services
- · Companies relying on non-personalized, generic robot interactions
Robots can perform personalized tasks like fetching specific items for individuals, becoming more genuinely helpful in home and work settings.
Increased adoption of robots in highly personalized environments due to their improved utility and ease of adaptation.
The development of truly 'personal' robot companions that understand and cater to individual user needs and preferences across broad domains.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI