Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening

arXiv:2605.26283v1 Announce Type: cross Abstract: Modern deep learning offers powerful tools for automated retinal screening, but it remains unclear how different visual model families compare in realistic multi-disease settings and under domain shift. In this work, we benchmark twelve architectures across four model families: convolutional neural networks, vision transformers, hybrid CNN-transformer backbones, and vision-language models, using the Retinal Fundus Multi-disease Image Dataset (RFMiD). We evaluate two tasks: binary screening for any retinal disease and multi-label classification
The proliferation of various deep learning architectures and the increasing demand for automated medical diagnostics are driving comprehensive benchmarking studies in real-world settings.
This research provides crucial insights into the efficacy and comparative performance of advanced AI models for medical screening, directly impacting healthcare automation and diagnostic accuracy.
The understanding of which AI model families are most effective for multi-disease retinal screening, especially under domain shift, will be refined.
- · AI healthcare solution providers
- · Patients with retinal diseases
- · Opthamologists
- · Deep learning researchers
- · Traditional manual screening methods
- · Less efficient AI models
- · Undiagnosed conditions
Improved accuracy and efficiency in diagnosing retinal diseases using AI.
Accelerated adoption of specific, high-performing AI models within ophthalmology clinics globally.
Reduced healthcare costs and increased accessibility to early disease detection for a broader population.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG