Can Post-Training Turn LLMs into Good Medical Coders? An Empirical Study of Generative ICD Coding

arXiv:2606.13940v1 Announce Type: new Abstract: Automated International Classification of Diseases (ICD) coding is a core medical-coding task for billing, epidemiology, and clinical decision support. Generative large language models (LLMs) are often reported as weak medical coders, but this finding mainly comes from inference-time settings such as prompting, retrieval, reranking, or tool use, leaving the role of task-specific post-training underexplored. We present a controlled empirical study of post-training for generative ICD coding, comparing discriminative baselines with LLM coders across
The proliferation of generative LLMs has led to widespread testing in specialized domains, making empirical studies on their task-specific performance crucial for practical application.
Improving automated medical coding with LLMs can significantly reduce administrative costs, enhance billing accuracy, and provide better data for epidemiology and clinical decision support.
The perception of LLMs' capabilities in specialized, structured tasks like medical coding, moving from 'weak' performers to potentially effective tools with proper post-training.
- · Healthcare providers
- · AI developers specializing in healthcare
- · Medical billing companies
- · Patients (through potentially lower costs and better data)
- · Human medical coders (for routine tasks)
- · Traditional medical coding software vendors (if LLMs prove superior)
Increased adoption of LLMs for automated medical coding tasks due to demonstrated efficacy.
Reduced operational costs for healthcare systems, potentially freeing up resources for patient care or R&D.
The development of highly specialized 'medical AI agents' that can autonomously handle complex diagnostic and administrative functions based on improved coding accuracy.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL