DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline

arXiv:2512.14896v2 Announce Type: replace-cross Abstract: In our study, we evaluated large language model (LLM) performance on pharmacy licensure-style question-answering tasks and developed an external knowledge integration method to improve accuracy. We benchmarked ten LLMs with varying parameter sizes (8 billion to 70+ billion) using a 141-question pharmacy dataset, measuring baseline accuracy without modification. Baseline performance ranged from 46% to 92%, with GPT-5 (92%) and o3 (89%) achieving the highest scores, while smaller open-source models showed substantially lower performance.
The rapid advancement and widespread adoption of large language models are pushing developers to find practical applications and demonstrate measurable improvements in specialized domains.
This research provides a clear benchmark for LLM performance in critical, high-stakes domains like pharmacy, demonstrating that specialized fine-tuning and retrieval augmentation can significantly improve accuracy.
The explicit performance gap between general-purpose LLMs and those augmented for specific knowledge domains indicates a clear path for enterprise AI solutions beyond foundational models, challenging generalized LLM claims.
- · Specialized AI solution providers
- · Healthcare sector AI adopters
- · LLM developers focusing on retrieval-augmented generation (RAG)
- · Pharmaceutical industry
- · General-purpose LLM providers without strong domain adaptation strategies
- · Knowledge workers in pharmacy without AI-augmented tools
- · Traditional knowledge retrieval systems
Pharmacy and medical professionals will increasingly integrate LLM-powered assistants for decision support and information retrieval.
This success will accelerate the development of RAG-based LLMs across other regulated and specialized industries, such as law and finance.
The demonstrated performance of large models like GPT-5 suggests a potential for 'deskilling' or significant augmentation of highly specialized professional roles, creating new educational and regulatory challenges.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI