arXiv:2603.03915v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have shown remarkable potential in developing role-playing agents (RPAs). However, current evaluation frameworks rely heavily on well-known fictional characters, raising a critical concern: models may be leveraging their internal training memory of these characters rather than demonstrating role-playing capabilities. This reliance often leads to significant performance degradation when RPAs encounter unseen or out-of-distribution personas. To address this, we propose a more rigorous evaluation protocol desig
Source: arXiv cs.AI — read the full report at the original publisher.
