PageLLM: A Multi-Grained Reward Framework for Whole-Page Optimization with Large Language Models

arXiv:2506.09084v2 Announce Type: replace Abstract: Whole-page optimization (WPO) decides how search and recommendation results are surfaced to users, and large language models (LLMs) open a new route to it by treating page generation as sequence generation. Adapting LLMs to web-scale WPO, however, remains bottlenecked by the need for costly human annotations and by the mismatched granularity between page-level coherence and item-level placement. In this work we show that these two challenges are coupled: implicit user feedback alone suffices for alignment, provided the reward signal is decoup
The rapid advancement and increased accessibility of large language models are creating new opportunities to apply them to complex optimization problems such as whole-page generation.
This development indicates a more sophisticated application of AI, potentially automating and optimizing digital interfaces with significant implications for user experience and economic outcomes.
The method of optimizing web and app pages could fundamentally shift from heuristic-driven to AI-driven sequence generation, reducing manual effort and improving efficiency.
- · Digital platforms (e.g., e-commerce, search engines)
- · AI/ML developers and researchers
- · Users benefiting from optimized interfaces
- · Companies with strong data infrastructure
- · Traditional A/B testing service providers specializing in manual optimization
- · Companies slow to adopt advanced AI optimization
- · Teams focused on rule-based page design
Web and app interfaces become significantly more dynamic and personalized, adapting in real time based on user interaction.
This improved digital experience drives higher engagement and conversion rates, leading to increased revenue for platforms employing these models.
The automation of UI/UX optimization could lead to a re-skilling requirement for existing design and product teams, shifting their focus to higher-level strategic decisions rather than granular placements.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG