ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

arXiv:2606.09630v1 Announce Type: cross Abstract: Vision-language-action (VLA) policies provide strong priors for language-conditioned manipulation, but remain brittle in off-nominal states requiring targeted recovery. We propose ReCoVLA -- a failure-conditioned residual recovery framework that keeps a pretrained VLA policy frozen, uses an external vision-language model (VLM) to infer the failure mode and recovery stage, and compiles a structured reward from task-relevant components. Rather than using the VLM to generate actions or rewards directly, ReCoVLA uses it as a semantic reward selecto
The proliferation of advanced vision-language models (VLMs) and the increasing complexity of robotic manipulation tasks are driving the need for more robust failure recovery mechanisms in AI policies.
This research addresses a critical limitation in current language-conditioned robotic systems, making them more resilient and capable of operating autonomously in dynamic, real-world environments.
Vision-language-action policies can now leverage external VLMs to intelligently diagnose and recover from failures without retraining the core policy, significantly improving their reliability and reducing development overhead.
- · Robotics developers
- · Automation industries
- · AI software providers
- · Human supervisors in automated environments (less intervention needed)
Robots will become more autonomous and reliable in completing complex tasks.
This increased reliability will accelerate the adoption of robotic automation in new sectors, reducing operational costs.
More capable and autonomous robots could lead to shifts in labor markets and increased demand for advanced AI skills.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG