Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain

arXiv:2606.28772v1 Announce Type: cross Abstract: Hate speech annotation pipelines routinely collapse annotator disagreement into majority vote labels before training. We show that this aggregation is not neutral: 42.6% of all annotator disagreement in HateXplain concentrates specifically at the hate/offensive boundary, a pattern consistent with annotators applying different thresholds for where hate begins (chi-squared = 135.199, df = 2, p < 0.0001). Both a hard-label BERT model (Model A) and a soft-label model (Model B) drop 22 percentage points in accuracy from agreed posts (~80%) to disagr
The proliferation of AI models for content moderation highlights the critical need for nuanced human annotation and robust validation methods, making this research timely.
This research reveals a fundamental challenge in AI training data curation, particularly for sensitive categories like hate speech, impacting model reliability and fairness.
The understanding that majority vote in annotation pipelines can systematically obscure minority values, especially at critical classification boundaries, thereby biasing model performance.
- · Researchers in AI ethics and fairness
- · Developers of advanced annotation tools
- · Platforms seeking more accurate content moderation
- · Developers relying solely on majority vote for annotation aggregation
- · Users impacted by biased AI moderation systems
AI models trained on aggregated majority-vote data will continue to exhibit biases at the hate/offensive boundary.
Increased investment in multi-perspective annotation techniques and AI models capable of handling and learning from disagreement.
The development of 'disagreement-aware' AI systems that can explain why certain content might be categorized differently by various annotators, fostering greater transparency and trust.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI