arXiv:2602.11460v2 Announce Type: replace Abstract: Large language models (LLMs) have shown great potential for healthcare applications. However, existing evaluation benchmarks provide minimal coverage of Alzheimer's Disease and Related Dementias (ADRD). To address this gap, we introduce ADRD-Bench, a preliminary ADRD-specific LLM benchmark. ADRD-Bench has two components: 1) ADRD Unified QA, a synthesis of 1,438 questions consolidated from seven established medical benchmarks, providing a unified assessment of clinical knowledge; and 2) ADRD Caregiving QA, a novel set of 149 questions derived
Source: arXiv cs.CL — read the full report at the original publisher.
