Evaluating Large Language Models Abilities for Addressee, Turn-change, and Next Speaker Prediction in Meetings

arXiv:2606.17542v1 Announce Type: new Abstract: We investigate turn-taking in multimodal multi-party conversations using large language models (LLMs). We construct an evaluation framework for three tasks: addressee detection, turn-change prediction, and next speaker prediction. We compare supervised models trained for these tasks, text-based LLMs, multimodal LLMs (MM-LLMs), and human subjects. Experiments on the AMI corpus showed that LLMs outperformed supervised models and humans in next speaker prediction, despite not being trained on the target domain and without access to audio or visual i
This research provides a current assessment of LLM capabilities in complex conversational dynamics, highlighting rapid advancements in specific interaction prediction tasks.
A strategic reader should care because predictive AI in multi-party conversations is critical for advanced AI agents, virtual assistants, and interface design, impacting productivity and human-computer interaction.
The demonstrated superiority of LLMs over supervised models and even humans in next speaker prediction, without specific training or multimodal input, indicates new pathways for AI to understand and manage social dynamics.
- · AI agents developers
- · Conversational AI companies
- · Virtual meeting platforms
- · Traditional supervised learning models
- · Manual conversational analysis
Improved performance in AI systems that require understanding and participating in multi-party dialogues.
Development of more sophisticated and natural human-AI and human-human-AI interaction models, leading to greater AI integration in collaborative work environments.
Potential for AI to autonomously manage and orchestrate complex social dynamics in group settings, reducing friction and enhancing efficiency.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL