ZEBRA: Zero-Shot Entropy-Regularized Prompt Learning for Base-to-Novel Generalization in Audio-Language Models

arXiv:2606.31587v1 Announce Type: cross Abstract: Audio-Language Models (ALMs) achieve strong zero-shot performance by aligning audio with textual class descriptions. Although prompt learning improves accuracy on base classes through few-shot supervised adaptation, we observe a critical trade-off: it often degrades performance on novel classes, sometimes falling below zero-shot accuracy. This exposes a base-to-novel generalization gap in prompt learning for ALMs. To address this issue, we propose \textbf{ZEBRA} (Zero-shot Entropy-Regularized Prompt Learning for Base-to-Novel Generalization), a
The proliferation of Audio-Language Models (ALMs) creates an immediate need for advanced techniques to improve their generalization capabilities, especially as their applications expand beyond initial training data.
Improving the base-to-novel generalization gap in ALMs is crucial for their real-world applicability, allowing them to adapt to new audio classes without degrading performance on previously learned ones.
Prompt learning methods for ALMs can now address the trade-off between improving base class accuracy and maintaining performance on novel classes, through techniques like entropy-regularization.
- · AI researchers and developers
- · Companies deploying ALM solutions
- · End-users of audio processing applications
- · Companies relying on less generalized ALM techniques
Increased efficiency and reliability of zero-shot and few-shot learning in audio-language models.
Faster development and deployment of ALM applications in diverse and evolving environments.
Broader adoption of ALMs across various industries, including those requiring rapid adaptation to new data types.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI