Automatic identification of diagnosis from hospital discharge letters via weakly supervised Natural Language Processing

arXiv:2410.15051v3 Announce Type: replace-cross Abstract: Identifying patient diagnoses from hospital discharge letters is essential for large-scale cohort selection and epidemiological research, but traditional supervised approaches require extensive manual annotation, which is often impractical for large textual datasets. We present a weakly supervised Natural Language Processing (NLP) pipeline for classifying Italian discharge letters without document-level manual annotation. The method extracts diagnosis-related sentences, generates semantic embeddings using a transformer model further pre
The proliferation of advanced NLP models and the increasing availability of medical text data are enabling more sophisticated and less labor-intensive approaches to healthcare data analysis.
This development significantly lowers the barrier for extracting critical diagnostic information from unstructured medical texts, enabling larger-scale epidemiological research and more efficient patient cohort selection.
The reliance on extensive manual annotation for analyzing medical discharge letters is reduced, making large-scale data analysis more feasible and cost-effective.
- · Healthcare researchers
- · NLP developers
- · Medical AI companies
- · Public health organizations
- · Companies relying on manual medical data annotation
- · Traditional medical data warehousing services
Faster and cheaper extraction of patient diagnoses from discharge letters becomes possible.
Improved epidemiological studies and real-world evidence generation due to enhanced access to structured diagnostic data.
Accelerated development of personalized medicine and early disease detection systems through the analysis of vast, previously inaccessible, diagnostic histories.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG