
arXiv:2605.25454v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly being used for emotional support. They are also being developed for formal therapy purposes. However, LLMs like ChaptGPT or Llama are often developed with content moderation guardrails that prevent them from discussing sensitive subjects with users for both liability and safety purposes, and this inability to broach these subjects may affect their capacity as therapists. In this study, we perform an algorithm audit on three state-of-the-art moderation systems (OpenAI's moderation endpoint, Meta's Ll
The growing deployment of LLMs in sensitive applications like mental health necessitates immediate research into their inherent ethical and practical limitations, particularly concerning content moderation.
This research highlights a critical bottleneck for AI's adoption in healthcare, as current moderation practices designed for general use may undermine the therapeutic efficacy of LLMs.
The focus for developing LLMs for therapy will shift towards nuanced moderation systems that balance safety with the necessity to discuss sensitive topics, potentially leading to specialized AI models.
- · AI ethicists
- · Developers of custom moderation systems
- · Mental health tech platforms
- · Policy makers
- · General-purpose LLM providers without specialized moderation
- · Mass-market AI therapy solutions
Algorithmic audits will become a standard and critical part of deploying AI in mental health applications.
This could lead to legal and regulatory frameworks specifically addressing AI content moderation in therapeutic contexts.
A new industry niche may emerge for 'therapeutic AI' platforms that prioritize nuanced, context-aware content moderation over blunt safety guardrails.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL