
arXiv:2606.16351v1 Announce Type: new Abstract: We introduce the Transmasculine Attitudes and Speech Corpus (TMASC), a multimodal corpus of 196 transmasculine individuals, including questionnaire responses and 66 audio recordings. The questionnaire includes items exploring the vocal health of transmasculine individuals. The audio recordings include cough and throat-clearing samples, a reading passage, and additional session-specific questions. This paper outlines the development of this corpus and the data collection procedures. To illustrate the utility of this corpus, we present three case s
The increasing sophistication of AI models and the growing demand for diverse, ethically collected datasets for voice AI research drive the creation of specialized corpora like TMASC.
This corpus provides critical, nuanced data for speech technology development related to transmasculine voices, addressing a significant gap in current AI datasets and promoting more inclusive AI.
The availability of TMASC will enable researchers to build more accurate and inclusive voice recognition, synthesis, and health-monitoring AI tools specifically for transmasculine individuals.
- · AI researchers
- · Speech technology developers
- · Transmasculine individuals
- · Healthcare providers
- · Developers relying on biased datasets
Improved performance of AI communication tools for transmasculine voices.
Reduced misgendering and improved healthcare outcomes through AI-powered vocal health monitoring.
Enhanced trust and adoption of AI technologies by diverse gender identity groups due to increased inclusivity.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL