Gumbel-BEARD: Automatic Layer Selection for Self-Supervised Adaptation of Whisper in Low-Resource Domains

arXiv:2606.11429v1 Announce Type: cross Abstract: Speech foundation models often struggle in low-resource domains due to domain mismatch and data scarcity. We propose Gumbel-BEARD, a domain adaptation framework that automates Whisper encoder layer selection via an end-to-end trainable hard Gumbel-Softmax selector. It enables self-supervised adaptation with a BEST-RQ objective that dynamically adapts to target acoustic characteristics without manual tuning. Experiments on the MyST child speech corpus demonstrate efficiency and scalability: with 10 h of labeled data for fine-tuning, our method m
The proliferation of speech foundation models and increasing demand for their application in diverse, often low-resource, environments necessitates automated adaptation techniques.
This development addresses a critical challenge in AI adoption, enabling robust usage of speech models in specialized domains where data scarcity and model mismatch are prevalent.
The ability to self-supervise and automate adaptation for speech models like Whisper reduces manual effort and increases the applicability of advanced AI in niche or underserved language/domain contexts.
- · AI developers in specialized speech domains
- · Organizations with low-resource domain data
- · Speech technology users in diverse linguistic settings
- · Cloud AI service providers
- · Manual model adaptation specialists
- · Generic, one-size-fits-all speech model providers (without adaptation features)
Self-supervised adaptation methods become a standard component of foundation model deployment.
Accelerated AI adoption in sectors previously limited by data scarcity or domain specificity, particularly for voice interfaces.
Enhanced accessibility and utility of AI for a wider range of global languages and specialized industry terminologies, fostering new markets for AI applications.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL