
arXiv:2606.17321v1 Announce Type: new Abstract: Training computer-use agents (CUAs) -- models that interact with graphical desktops through screenshots and keyboard/mouse actions -- requires large-scale, diverse trajectory data collected in full desktop environments. The largest public resource, AgentNet (22.5K human trajectories), leads to negative transfer when used for supervised fine-tuning (SFT): continuing training UI-TARS 7B on AgentNet causes OSWorld success rate to fall from 26.3% to 8-10%. We present ProCUA-SFT, a dataset of 3.1M step-level SFT samples distilled from 93K synthetic tr
The continuous development and scaling of AI models necessitate more efficient methods for training computer-use agents, leading to the creation of advanced datasets like ProCUA-SFT.
Improving the training efficiency and performance of computer-use agents is crucial for the advancement of autonomous AI systems capable of complex desktop interactions.
The ability to train AI agents with higher success rates and less negative transfer through refined synthetic data significantly advances their practical deployment and capabilities.
- · AI agent developers
- · Automation software providers
- · Businesses leveraging AI for operational tasks
- · AI research institutions
- · Companies reliant on manual digital labor
- · Inefficient AI training data providers
Increased efficacy of AI agents interacting with graphical user interfaces.
Acceleration in the development and deployment of autonomous AI for a wider range of digital tasks.
Significant reorganization of white-collar work due to highly capable AI agents replacing or augmenting human roles.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG