
arXiv:2607.01345v1 Announce Type: new Abstract: Turn-taking naturalness is central to full-duplex spoken dialogue systems, yet its automatic evaluation remains limited. Existing evaluations often rely on human judgments or behavior-specific timing metrics, making it difficult to compare heterogeneous timing failures within a unified framework. We propose TurnNat, a likelihood-based framework for automatic turn-taking naturalness evaluation in two-channel spoken dialogue. A causal turn-taking prediction model trained on natural conversations estimates future two-speaker voice-activity states, a
The increasing sophistication and widespread deployment of spoken dialogue systems necessitate more robust and automatic evaluation methods to accelerate development.
This development allows for more accurate and efficient measurement of human-like interaction in AI, directly impacting the quality and adoption of conversational AI.
The ability to automatically and comprehensively evaluate turn-taking naturalness will allow AI developers to pinpoint and address conversational flaws more effectively, moving away from subjective human judgment.
- · Conversational AI developers
- · Speech recognition companies
- · Customer service automation providers
- · Manual human evaluation services
Improved naturalness of AI voice assistants and customer service bots.
Increased user satisfaction and broader adoption of AI-powered spoken interfaces in various sectors.
Further blurring of the line between human and AI conversational partners, raising new ethical and societal questions.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL