Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models

arXiv:2605.31393v1 Announce Type: cross Abstract: Sign language translation (SLT) remains constrained by limited paired sign-video/text corpora and heavy-tailed target vocabularies. We study target-side augmentation in which GPT-4o generates controlled paraphrase variants of reference sentences while the sign input remains unchanged. A Signformer-style pose-based Transformer is trained under a two-stage schedule: pre-training on the augmented corpus followed by fine-tuning on the original references. We evaluate on three datasets spanning complementary challenges: PHOENIX14T (German Sign Langu
The proliferation of advanced large language models like GPT-4o provides new avenues to address data scarcity challenges in niche domains like sign language translation.
This research demonstrates a practical methodology for improving sign language translation by leveraging LLMs for data augmentation, potentially leading to more accessible communication technologies.
The ability to generate high-quality paraphrase variants for low-resource languages and modalities using LLMs streamlines data procurement and model training processes.
- · Sign language users
- · NLP researchers
- · Assistive technology developers
Improved accuracy and fluency in sign language translation systems.
Increased adoption and integration of sign language translation in various applications and platforms.
Enhanced societal inclusion and communication accessibility for deaf and hard-of-hearing communities globally.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI