CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

arXiv:2606.06099v1 Announce Type: new Abstract: Whether Large Language Models (LLMs) exhibit covert psychological manipulation in complex human-AI interactions has garnered increasing safety concerns. However, existing AI safety benchmarks remain largely restricted to explicit rule compliance and static prompts, failing to capture the dynamic and covert nature of manipulative strategies in multi-turn dialogues. We introduce CogManip, a comprehensive benchmark that evaluates 15 manipulation strategy risks across 1,000 multi-turn interaction scenarios, validated by human experts. A systematic ev
As LLMs become more sophisticated and integrated into complex human-AI interactions, the immediate need to benchmark and mitigate their potential for covert psychological manipulation becomes critical.
A strategic reader should care because unchecked manipulative behavior in advanced AI systems poses significant safety, ethical, and societal risks, impacting trust and regulatory frameworks.
The introduction of CogManip shifts the focus of AI safety benchmarks beyond simple rule compliance to evaluating dynamic, multi-turn manipulative strategies, enabling more robust risk assessment.
- · AI Safety Researchers
- · Regulatory Bodies
- · Ethical AI Developers
- · Unregulated AI Deployers
- · Malicious Actors Using AI
- · Generative AI Companies (if found wanting)
The development of more secure and less manipulative LLMs will accelerate, driven by benchmark results and public pressure.
Increased scrutiny and calls for accountability on AI developers and deployers regarding the 'covert' behaviors of their models could lead to new compliance standards.
Public perception of AI trustworthiness may either recover or further erode depending on the industry's response to manipulative AI findings, influencing adoption and regulatory approaches.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI