Are Online Skill and Memory Modules Always Worth Their Tokens? A Budget-Constrained Study of Web Agents

arXiv:2606.15017v1 Announce Type: new Abstract: Online web agents often augment a base actor with memory, workflow, or skill modules. These modules can improve performance, but they also consume test-time tokens, a cost rarely reported alongside the actor's inference cost. We study online augmentation, where this overhead is paid on every task, and re-evaluate its benefits under a fixed total inference budget. We compare AWM, ASI, and ReasoningBank with a token-matched vanilla baseline that uses the same budget for additional actor steps. Across three WebArena domains and three models, Gemini
The proliferation of advanced AI web agents necessitates a deeper understanding of their real-world operational costs, especially as they move from research to deployment.
This study highlights the critical trade-off between AI agent complexity (with modules) and token efficiency, directly impacting the economic viability and deployment strategies of autonomous systems.
The evaluation of AI agent performance will increasingly need to factor in test-time token consumption, potentially shifting design priorities towards more budget-constrained architectures.
- · Developers of efficient, lean AI agent architectures
- · Cloud providers with competitive token pricing
- · Businesses prioritizing cost-effective automation
- · Overly complex or token-intensive AI agent designs
- · Developers neglecting operational token costs
- · Organizations with unlimited compute budgets for agents
AI agent development will increasingly focus on token efficiency and optimization alongside performance metrics.
There will be a competitive advantage for models and frameworks that achieve high performance within strict token budgets.
The concept of 'agent efficiency' will become a key differentiator, influencing commercial adoption and market share in the AI agent space.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL