A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

arXiv:2606.09470v1 Announce Type: cross Abstract: Automated L2 speech assessment can assign proficiency labels, but often lacks interpretability. We propose a rubric-guided SpeechLLM for multi-aspect, multi-granular assessment, trained with a hybrid objective combining supervised fine-tuning and Bounded Direct Preference Optimization. The model jointly predicts ordinal labels at the sentence-level (accuracy, fluency, prosody), word/phoneme-level accuracy, and generates a natural-language rationale in the same response. On SpeechOcean762, our approach matches or outperforms single-granularity m
The proliferation of advanced LLMs combined with the demand for more nuanced and interpretable AI assessments in various domains is driving this development.
This development moves automated speech assessment beyond simple labels, offering detailed, multi-granular feedback with natural-language explanations, which is crucial for education, customer service, and human-computer interaction.
Automated speech assessment systems can now provide sophisticated, human-like feedback and rationales, enabling more effective feedback loops and potentially replacing human evaluators in specific contexts.
- · Education technology sector
- · Customer service platforms
- · Language learning applications
- · AI agents and developers
- · Manual speech assessors
- · Legacy speech recognition companies
More accurate and interpretable automated speech assessment becomes widely available.
Improved personalized learning experiences and reduced human labor costs in language assessment.
Enhanced AI 'understanding' of human communication nuances, leading to more natural and effective human-AI interaction across various applications.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI