
arXiv:2601.05847v2 Announce Type: replace Abstract: We revisit the problem of constructing interoperable patient digital twins from unstructured electronic health records (EHRs) and argue that the task is better cast not as a cascade of extraction modules but as constrained generation of a valid FHIR bundle. We introduce SG-LLM, a schema-grounded LLM extractor that (i) augments the prompt with candidate SNOMED-CT, RxNorm, and LOINC codes retrieved through a SapBERT index, (ii) decodes under a JSON Schema derived directly from FHIR R4 StructureDefinitions, and (iii) closes a validator-in-the-lo
The proliferation of LLMs creates a new paradigm for structured data extraction, making this a timely development for medical record interoperability.
This research introduces a method for generating highly accurate and standardized patient digital twins, critical for research, diagnostics, and personalized medicine.
The ability to transform unstructured EHRs into validated FHIR bundles via LLMs greatly improves data utility and reduces previous extraction challenges.
- · Healthcare providers
- · Medical AI researchers
- · Patients (improved care)
- · Electronic Health Record vendors
- · Manual data entry services
- · Legacy medical data extraction methods
Interoperable patient digital twins become a more realistic and widespread tool.
Accelerated development of AI-powered diagnostic and treatment tools leveraging this structured data.
Potential for a truly global, standardized medical data ecosystem that transforms healthcare delivery and research.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL