SIGNALAI·Jun 10, 2026, 4:00 AMSignal75Short term

A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

arXiv:2606.10066v1 Announce Type: cross Abstract: Medical vision-language models (VLMs) are evaluated on public benchmarks whose images and question-answer pairs have been freely downloadable for years, yet reported accuracy assumes these examples were absent from pretraining. We audit open VLMs on SLAKE-En, PathVQA, VQA-RAD, and an auxiliary public OmniMedVQA mirror using four detector families: image-side near-neighbour overlap against PMC-OA-beta, canonical-order exchangeability, cohort-relative Min-K%++ tail enrichment, and cross-model top-K overlap. We find measurable image-side source ov

Why this matters

Why now

The proliferation of medical vision-language models and accessible public benchmarks necessitates a re-evaluation of their foundational integrity as deployment accelerates.

Why it’s important

Contamination in medical AI datasets undermines reported performance, risking misdiagnosis and eroding trust in clinical AI applications, which can have significant real-world consequences.

What changes

The reliability of existing benchmark results for medical VLMs is now under scrutiny, requiring developers and evaluators to implement stricter data provenance checks and potentially re-evaluate model capabilities.

Winners

· AI auditing firms
· Data provenance solutions
· Ethical AI research
· Hospitals with strict AI validation teams

Losers

· VLM developers with contaminated models
· Benchmark creators with lax data hygiene
· Early adopters relying on unverified VLM efficacy

Second-order effects

Direct

Increased scrutiny and demand for robust data auditing processes in AI development, especially in critical sectors like medicine.

Second

A potential slowdown in the adoption of medical VLMs as stakeholders demand validated, contamination-free benchmarks and models.

Third

The emergence of new, proprietary, and highly curated medical datasets that become competitive differentiators for AI developers.

Editorial confidence: 90 / 100 · Structural impact: 55 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.LG

#cs.CV #cs.AI #cs.LG

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.