Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations

arXiv:2606.00832v1 Announce Type: new Abstract: Recent advances in agentic AI have enabled agents to complete complex tasks through tool use, reasoning, and multi-step planning. Yet existing benchmarks evaluate agents within a single session, ignoring past actions, stated preferences, and prior decisions that agents must integrate to fulfill personalized user goals. We introduce Momento, a benchmark for persistent agentic task completion in multi-session service environments, requiring agents to take consequential, tool-mediated actions while resolving temporal dependencies and evolving user g
The proliferation of advanced AI agents highlights the current limitation of single-session evaluation, necessitating new benchmarks for multi-session persistence and reasoning.
This benchmark addresses a critical gap in AI agent development, focusing on persistent memory and multi-session reasoning, which is essential for agents to handle complex, real-world tasks and personalized user interactions effectively.
The introduction of Momento shifts the focus of AI agent evaluation from isolated tasks to continuous, multi-session interactions, forcing developers to build agents capable of integrating past actions and evolving preferences.
- · AI agent developers
- · Companies building personalized AI services
- · Developers of memory architectures for AI
- · AI agents lacking persistent memory
- · Benchmarks limited to single-session evaluations
- · Developers focused solely on short-term task completion
AI agents will become significantly more capable of handling complex, personalized user workflows over extended periods.
This improved capability will accelerate the adoption of AI agents in service industries, impacting customer support, personal assistants, and workflow automation.
The development of highly persistent, reasoning agents could lead to new forms of human-AI collaboration and autonomous decision-making in personal and professional contexts.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL