Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries

arXiv:2606.05970v1 Announce Type: new Abstract: Large language models are increasingly used for structured extraction from clinical free-text notes, but the sensitivity of their output to upstream configuration choices is less understood than their accuracy on fixed benchmarks. This work measures that sensitivity without human-annotated ground truth, by holding the extraction task fixed and varying one choice at a time. The fixed schema comprises 17 clinical documentation flags on a three-way yes/no/not_documented value set and a 47-tag vocabulary for the primary admission reason. Three prompt
The proliferation of LLMs in specialized domains like healthcare necessitates robust methods for evaluating their reliability beyond simple accuracy, especially as real-world deployments accelerate.
Understanding the sensitivity of LLM-based extraction to various choices is crucial for developing trustworthy and robust AI applications in critical sectors, impacting patient safety and data utility.
This research provides a framework for measuring LLM output sensitivity without human-annotated ground truth, which can streamline evaluation processes and improve model development cycles.
- · AI developers in healthcare
- · Healthcare providers adopting AI
- · Patients benefiting from improved data extraction
- · AI ethics and safety researchers
- · Developers relying on black-box LLM implementations
- · Traditional manual data extraction services
More reliable and deployable LLM-based tools for structured data extraction from clinical narratives will emerge.
Improved data quality from clinical notes will accelerate medical research, drug discovery, and personalized treatment plans.
The methodology could be generalized to other sensitive domains, fostering broader adoption of AI in critical infrastructure with greater confidence.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL