Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection

arXiv:2604.13899v3 Announce Type: replace Abstract: Instruction-tuned LLMs can annotate thousands of instances at low cost. This raises two questions for active learning (AL): can LLM labels replace human labels within the AL loop, and does AL remain necessary when entire corpora can be cheaply labeled? We investigate both on a new dataset of 277,902 German political TikTok comments (25,974 LLM-labeled, 5,000 human-annotated), comparing LLM and human annotation across seven conditions, four encoders, and 10 random seeds. Under a two-question interface that mirrors the human annotation task, LL
The rapid advancement of instruction-tuned LLMs has made them capable of performing complex annotation tasks at scale, challenging traditional human-centric workflows in machine learning.
This research directly impacts the cost, speed, and scalability of data labeling for AI models, potentially accelerating AI development cycles and altering labor requirements for data annotation.
The perceived necessity of human involvement in iterative data labeling within active learning loops for tasks like hostility detection is being re-evaluated, with LLMs showing potential to replace or significantly reduce human annotation efforts.
- · AI developers
- · Companies with large data labeling needs
- · LLM providers
- · Sovereign AI initiatives
- · Human data annotators
- · Traditional data labeling services
Reduced costs and accelerated development timelines for AI models requiring large annotated datasets.
A shift in demand for human labor from direct annotation to oversight and validation of AI-generated labels, potentially creating new job categories.
Enhanced AI capabilities across various domains due to faster, cheaper data acquisition, contributing to broader AI adoption and sophistication.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL