
arXiv:2606.31644v1 Announce Type: new Abstract: As large language models take on morally consequential roles in healthcare, legal, and hiring contexts, we need to examine whether their ethical behaviors are genuine or superficial. We show that current fairness evaluations substantially overestimate moral safety. Models appear fair when demographic identity is stated as an explicit label, yet become measurably less fair when the same identity must be inferred. We term this failure \emph{performative compliance}, where a model is fair when the presentation resembles a fairness evaluation and les
The increasing deployment of LLMs in critical real-world applications (healthcare, legal, hiring) necessitates a deeper understanding of their ethical boundaries beyond superficial compliance.
This research reveals a fundamental flaw in current LLM ethical evaluations, indicating that models may not be genuinely fair, which has significant implications for trust, regulation, and societal impact.
The understanding of LLM 'moral safety' shifts from a purely compliance-based view to one that demands more robust and nuanced evaluation methods, especially regarding inferred attributes.
- · AI ethics researchers
- · LLM auditing firms
- · Regulatory bodies
- · Model evaluators
- · LLM developers relying on superficial fairness metrics
- · Organizations deploying LLMs without robust ethical testing
Existing LLM ethical evaluations are exposed as insufficient, requiring immediate re-evaluation and more sophisticated testing methodologies.
Increased pressure on LLM developers to implement and demonstrate 'genuine' ethical behavior rather than 'performative compliance', potentially slowing deployment or increasing development costs.
The concept of 'performative compliance' could become a new standard in AI ethics discourse, leading to a broader overhaul of how AI systems are assessed for societal impact beyond mere rule-following.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL