Learning from Annotation Uncertainty: Entropy-Aware Curriculum for Speech Emotion Recognition

arXiv:2606.27536v1 Announce Type: cross Abstract: Speech emotion recognition (SER) often relies on hard consensus labels that collapse annotator disagreement. We study distribution-based supervision for 9-class SER on MSP-Podcast 2.0 using a WavLM-Base multitask model for categorical emotion and dimensional VAD. Hard-label training is compared with targets from primary and merged primary--secondary annotator vote distributions. Distributional objectives improve alignment with human vote distributions, reducing JSD/KLD relative to hard-label training. Analysis shows that hard supervision partly
The proliferation of AI models interacting with human speech and emotion necessitates more nuanced approaches to training data, moving beyond simplistic labels.
Improving speech emotion recognition through better handling of annotation uncertainty can lead to more robust and human-like AI interactions, impacting multiple industries.
This research suggests a pathway to more accurate and reliable AI models for understanding human emotion in speech by accounting for annotator disagreement during training.
- · AI developers
- · Customer service platforms
- · Mental health tech
- · Human-computer interaction research
- · AI models relying solely on hard-coded emotional labels
More sophisticated speech emotion recognition models will emerge with improved accuracy and reduced bias.
Enhanced human-AI interfaces will become more empathetic and context-aware, improving user experience across applications.
The development of AI systems capable of understanding nuanced emotional states could lead to new ethical considerations around AI manipulation and privacy.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG