TAME: A Trustworthy Test-Time Evolution of Agent Memory with Systematic Benchmarking

arXiv:2602.03224v2 Announce Type: replace-cross Abstract: Test-time evolution of agent memory represents a pivotal paradigm for advancing AGI, as it strengthens complex reasoning through experience accumulation without requiring parameter updates. However, even during benign task evolution, agent safety alignment remains vulnerable, a phenomenon known as Agent Memory Misevolution. To evaluate this phenomenon, we construct the Trust-Memevo benchmark and find that agents exhibit an overall decline in trustworthiness across multiple tasks during benign task evolution. To address this issue, we pr

Source: arXiv cs.LG — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.

Stay ahead of the systems reshaping markets.