Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote -- in light of this variation. Yet, rationales may provide additional insights into the richness of human reasoning, that may differ in style, values and interpretations -- especially in subjective NLP tasks like hate speech detection.
The increasing sophistication and widespread application of AI models, particularly in subjective tasks like hate speech detection, necessitates deeper understanding and robust evaluation of their underlying reasoning mechanisms.
Improved methods for evaluating human labels and explanations in subjective NLP tasks directly impact the reliability, fairness, and ethical deployment of AI systems, especially in sensitive domains.
This research highlights the need for new methodologies to assess AI explainability beyond simple agreement, potentially leading to more nuanced and robust evaluation frameworks for AI models.
- · AI ethics research institutions
- · NLP researchers
- · Responsible AI developers
- · Users of AI systems
- · Developers of simplistic AI evaluation metrics
- · Companies deploying unexplainable AI without scrutiny
AI models will be developed with a greater emphasis on interpretable and justifiable reasoning.
New standards and benchmarks for evaluating AI explainability will emerge, particularly in areas with subjective human judgment.
The legal and regulatory frameworks for AI accountability may incorporate requirements for rationale-based explainability in sensitive applications.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL