
arXiv:2606.00168v1 Announce Type: new Abstract: AI systems are increasingly deployed in conversational settings where users may be uncertain whether they are speaking with a human or an AI. Despite mounting regulatory attention to this known safety risk, existing evaluations of AI disclosure are typically English-only, based on machine-generated questions, and restricted to text. We present RealityTest to comprehensively test whether AI systems disclose their identity when asked. The benchmark is the first large-scale multimodal and multilingual evaluation, grounded in human data on how people
Amidst increasing deployment of AI in conversational settings, regulatory bodies and the public are grappling with the need for transparency regarding AI identity.
This benchmark directly addresses a significant safety risk and regulatory concern regarding AI transparency, impacting user trust and legal frameworks.
The introduction of a comprehensive, multimodal, and multilingual evaluation method provides a standardized way to assess AI disclosure, moving beyond limited English-only text-based evaluations.
- · Regulatory bodies
- · AI ethics researchers
- · Users of conversational AI
- · AI developers circumventing disclosure
- · Companies relying on AI deception
Increased pressure on AI developers to implement robust disclosure mechanisms.
Potential for new product features or compliance requirements related to AI identity disclosure.
Enhanced public perception and trust in AI systems due to greater transparency.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL