
arXiv:2605.26144v1 Announce Type: cross Abstract: We present VISTA (VIsual Spec-To-App Benchmark), a benchmark for evaluating the end-to-end web-app generation capabilities of LLM-based agents. Unlike prior code generation benchmarks that focus on algorithmic tasks, VISTA targets realistic UI-centric development, where agents must produce functional, visually coherent applications from underspecified inputs. We define five prompt-information conditions that vary along two axes, visual/structural fidelity and stack constraint: (1) text only with free stack choice, (2) text with reference screen
The rapid advancement of large language models is enabling attempts to automate increasingly complex tasks, making full web-app generation a logical next frontier for benchmarking agentic capabilities.
A robust benchmark for visual spec-to-web-app coding agents indicates significant progress towards automating full-stack development, compressing software delivery cycles and dramatically lowering the barrier to application creation.
The focus of AI code generation research is shifting from algorithmic tasks to end-to-end, visually driven application development, highlighting the growing sophistication of AI agents in UI/UX and full-stack integration.
- · Software developers (augmented)
- · Small businesses/startups
- · Cloud infrastructure providers
- · AI agent developers
- · Junior web developers
- · Low-code/no-code platforms (legacy)
- · Traditional software development agencies
- · Manual UI/UX designers
AI agents will be able to generate functional and visually coherent web applications from high-level specifications.
This will lead to an explosion in custom web application development, accessible even to non-technical users, fundamentally altering the software development landscape.
The proliferation of easily generated, custom web applications could reshape industries reliant on bespoke software, leading to new business models and increased digital transformation across sectors.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI