Sign-Language Datasets at Scale: A Comprehensive Survey on Resources, Benchmarks, and Annotation Standards

arXiv:2606.19352v1 Announce Type: new Abstract: Sign languages are expressive visual languages used by Deaf and Hard-of-Hearing (DHH) communities. Despite substantial progress in sign-language recognition, translation, and production, advances remain constrained by fragmented datasets, inconsistent annotations, and limited linguistic coverage. Existing benchmarks often fail to reflect real-world communication needs, and systematic analyses of these limitations remain limited. In this survey, we present a comprehensive index of sign-language datasets, covering 120 resources across 35 sign langu
The proliferation of AI models makes the lack of robust, standardized sign language datasets a critical bottleneck, demanding immediate attention to bridge the accessibility gap.
This survey highlights a crucial technical limitation preventing AI from effectively serving the Deaf and Hard-of-Hearing communities, indicating a significant unmet need and potential for future AI development.
The explicit cataloging of fragmented resources and inconsistent annotation standards for sign languages provides a clear roadmap for future research and development, potentially accelerating progress in this neglected area of AI.
- · AI researchers in sign language recognition
- · Deaf and Hard-of-Hearing communities
- · Developers of accessible AI applications
- · Academic institutions focused on linguistic diversity
- · Fragmented, non-standardized dataset initiatives
- · AI models reliant on limited, bespoke sign language data
Increased investment and coordinated efforts to develop large-scale, standardized sign language datasets will likely follow.
Improved AI models for sign language translation and production could emerge, enhancing communication accessibility and integration.
This could foster new economic opportunities and educational paradigms for Deaf and Hard-of-Hearing individuals, enabled by advanced AI tools.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL