
arXiv:2606.29534v1 Announce Type: new Abstract: Popular ASR test sets adopt inconsistent conventions for numbers, disfluencies, entities, and casing, while standard normalizers erase the format distinctions users care about. Current benchmarks therefore cannot measure whether a model follows user preferences for output style. We introduce PreferenceASR, a test set evaluating ASR systems on their ability to follow natural-language preference instructions across four categories: normalization, entities, disfluencies, and case. Built from seven open-source corpora via a two-stage LLM-assisted pip
The rapid advancement of Speech LLMs necessitates more sophisticated and nuanced ASR benchmarking, moving beyond simple accuracy to user preference alignment.
This development allows ASR systems to better integrate with and understand human intent, crucial for a more natural and effective interaction with AI systems.
ASR benchmarks now incorporate user preference instructions, which was previously overlooked, leading to more human-centric model development.
- · Speech LLM developers
- · ASR system providers
- · Businesses relying on voice interfaces
- · End-users of speech AI
- · ASR models optimized solely for traditional metrics
- · Developers neglecting preference alignment
ASR models will be developed to be more sensitive to user-specific output formats and styles.
Improved preference alignment in ASR will enhance the user experience and adoption of voice-controlled AI applications.
The ability to customize output via natural language will accelerate the development of highly personalized and adaptive AI assistants.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL