
arXiv:2511.08093v2 Announce Type: replace-cross Abstract: Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantization (PTQ) on Whisper-small that disentangles the impact of quantization scheme, method, granularity, and bit-width. Our study is based on four libraries: PyTorch, Optimum-Quanto, HQQ, and bitsandbytes. Experiments on LibriSpeech test-clean and test-other show that dynamic int8 quantization with Qua
The growing demand for sophisticated AI models on resource-constrained devices makes model optimization techniques like quantization increasingly critical for adoption.
This research outlines practical methods for deploying large, high-performing AI models like Whisper on edge devices, expanding their accessibility and use cases beyond high-compute environments.
The ability to run advanced speech recognition models efficiently on local hardware changes the cost and accessibility paradigm for AI-powered voice interfaces and applications.
- · AI hardware manufacturers (edge devices)
- · Developers of AI-powered mobile/embedded applications
- · Companies seeking to reduce cloud inference costs
- · Users in regions with limited internet connectivity
- · Cloud-based AI inference providers (for certain use cases)
- · Companies relying on large, unoptimized models for edge applications
More widespread deployment of accurate speech recognition on devices like smartphones, wearables, and IoT appliances.
Increased innovation in AI applications that require real-time, offline, and privacy-preserving voice processing.
Potential for new hardware-software co-design paradigms focusing on ultra-efficient on-device AI inference.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL