ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition

arXiv:2601.19919v2 Announce Type: replace Abstract: Knowledge distillation (KD) is one of the most effective paradigms for compressing large-scale foundation models into deployable architectures. In the context of Automatic Speech Recognition (ASR), previous studies have predominantly focused on forcing the student model to strictly mimic the predictive distribution of a massive teacher model. However, this static dependency often presents an inherent trade-off: while the student rapidly acquires basic linguistic representations, it simultaneously inherits the teacher's domain-specific blind s
The proliferation of large-scale AI models necessitates more efficient and deployable architectures, making knowledge distillation a critical technique for real-world application.
Efficient and low-latency ASR models are crucial for pervasive AI integration, particularly in edge computing and resource-constrained environments, broadening AI's practical utility.
The focus shifts from strict teacher-student mimicry in knowledge distillation to adaptive self-knowledge mechanisms, allowing student models to surpass teacher limitations.
- · Edge AI developers
- · ASR providers
- · Hardware manufacturers (for on-device AI)
- · AI-driven product companies
- · Companies reliant solely on massive, inefficient models
- · Legacy ASR systems
More powerful and smaller ASR models will become widely available for integration across various applications.
Pervasive, real-time voice interfaces will accelerate the deployment of hands-free and assistive technologies.
Improved on-device AI will reduce reliance on cloud computing for certain tasks, impacting data sovereignty and privacy discussions.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL