A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

arXiv:2606.10066v1 Announce Type: cross Abstract: Medical vision-language models (VLMs) are evaluated on public benchmarks whose images and question-answer pairs have been freely downloadable for years, yet reported accuracy assumes these examples were absent from pretraining. We audit open VLMs on SLAKE-En, PathVQA, VQA-RAD, and an auxiliary public OmniMedVQA mirror using four detector families: image-side near-neighbour overlap against PMC-OA-beta, canonical-order exchangeability, cohort-relative Min-K%++ tail enrichment, and cross-model top-K overlap. We find measurable image-side source ov
The proliferation of medical vision-language models and accessible public benchmarks necessitates a re-evaluation of their foundational integrity as deployment accelerates.
Contamination in medical AI datasets undermines reported performance, risking misdiagnosis and eroding trust in clinical AI applications, which can have significant real-world consequences.
The reliability of existing benchmark results for medical VLMs is now under scrutiny, requiring developers and evaluators to implement stricter data provenance checks and potentially re-evaluate model capabilities.
- · AI auditing firms
- · Data provenance solutions
- · Ethical AI research
- · Hospitals with strict AI validation teams
- · VLM developers with contaminated models
- · Benchmark creators with lax data hygiene
- · Early adopters relying on unverified VLM efficacy
Increased scrutiny and demand for robust data auditing processes in AI development, especially in critical sectors like medicine.
A potential slowdown in the adoption of medical VLMs as stakeholders demand validated, contamination-free benchmarks and models.
The emergence of new, proprietary, and highly curated medical datasets that become competitive differentiators for AI developers.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG