The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction

arXiv:2606.02911v1 Announce Type: new Abstract: Current research primarily focuses on model performance, while comparatively less attention has been devoted to uncertainty estimation, particularly in settings where LLMs are increasingly used to generate annotated data. We introduce a framework combining conformal prediction with Collaborative Filtering-style annotators' representation to model LLM behavior in relation to human annotators and to analyze patterns of agreement and disagreement. Using Non-Conformity Scores, we introduce the Ghost Prediction metric and the Ghost Annotator represent
The increasing reliance on LLMs for data generation and content moderation highlights an urgent need for better uncertainty estimation and understanding of human-LLM agreement patterns.
This framework offers a method to quantitatively assess LLM behavior in relation to human annotators, which is critical for developing reliable and fair AI systems, especially in sensitive areas like content moderation.
The introduction of the 'Ghost Prediction' metric and 'Ghost Annotator' representation provides new tools for evaluating, understanding, and potentially improving the congruence between AI-generated and human-annotated data.
- · AI developers
- · Content moderation platforms
- · Researchers in AI ethics
- · Platforms with opaque AI content moderation
- · Systems relying on unvalidated LLM annotations
Improved understanding of LLM annotation reliability and human-AI alignment in content moderation.
Development of more robust and transparent AI systems for content moderation, leading to fairer outcomes.
Increased public trust in AI-driven content moderation and data generation processes, potentially influencing regulatory approaches.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL