
arXiv:2607.02049v1 Announce Type: new Abstract: Large Language Models are increasingly deployed in emotional-support contexts and crisis-related situations. Nevertheless, their cross-lingual abilities in these circumstances remain underexplored. Existing benchmarks emphasize multilingual performance but rarely examine crisis-related empathy and cultural grounding in low-to-mid-resource languages. We introduce SPLIT, a 500-prompt benchmark designed to evaluate LLM consistency in generating emotionally grounded responses across five categories: Stress, Panic, Loneliness, Internal Displacement, a
The increasing deployment of LLMs in sensitive emotional support and crisis contexts necessitates a deeper understanding of their cross-cultural and empathetic capabilities, particularly in non-dominant languages.
Evaluating LLM performance beyond basic multilingual proficiency to assess their empathetic and culturally grounded responses in critical situations is crucial for responsible AI deployment and mitigating potential harm.
The introduction of a specialized benchmark like SPLIT changes the methodology for assessing LLMs, pushing for more nuanced evaluations that consider cultural and emotional intelligence in crisis settings.
- · AI ethicists
- · Developers of empathetic LLMs
- · Humanitarian organizations
- · Users in low-to-mid-resource language communities
- · LLMs lacking cultural grounding
- · Developers focused solely on general multilingualism
- · Platforms deploying un-evaluated LLMs in crisis support
The benchmark will likely drive research and development towards LLMs with improved cross-lingual empathy and cultural grounding.
Enhanced LLM capabilities in these areas could lead to more effective and trusted AI-powered crisis support systems globally.
Successful integration of culturally sensitive AI could foster greater trust in AI systems overall, influencing broader adoption in public services.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL