Training Supervised Screening Models on 3+ Million Expert-Labelled Fundus Images

The first fully supervised medical-imaging model trained at this scale—and the first ophthalmology model of its kind.

Researchers from the AIM Lab, in collaboration with Chinese company Airdoc and multiple partner hospitals, announce a landmark project to train end-to-end supervised screening models on a dataset of more than three million expert-labelled fundus photographs. To our knowledge, this is the first time any medical-imaging model has been fully supervised on such a large, clinician-annotated corpus—and the first ophthalmology model to reach this scale.

Screening models

Why this matters

Eye diseases such as diabetic retinopathy, glaucoma, and macular disorders remain among the leading causes of preventable vision loss worldwide. Millions of people are at risk of losing sight every year, yet many cases could be avoided through timely and accurate screening. Unfortunately, access to reliable screening is often limited. Many health systems face a shortage of trained specialists, and even where services are available, differences in imaging devices and variability in clinical expertise can lead to inconsistent results.

This project addresses these challenges by building models trained directly on high-quality, clinician-verified labels. Instead of relying on indirect signals or proxy labels, the models learn from expert consensus annotations that reflect real diagnostic decision-making. This foundation allows the system to target the clinical measures that matter most: sensitivity, specificity, area under the curve (AUC), calibration, and interpretability. By explicitly aligning model training with these standards, the team aims to develop screening tools that are not only accurate in research settings but also reliable, fair, and usable in day-to-day clinical practice.

The broader significance goes beyond performance metrics. A robust and explainable screening model has the potential to relieve pressure on overstretched healthcare systems, extend screening to underserved communities, and provide clinicians with decision support that improves both confidence and consistency. In the long term, such infrastructure could help make preventive eye care more accessible and equitable worldwide, reducing avoidable blindness and improving quality of life for millions of patients.

Outcomes

Success will be measured by stable, clinically meaningful performance across centers and devices; clear explanations for model outputs; and evidence of generalization that withstands real-world variation. Where possible, the team plans to share model cards, evaluation protocols, and selected research artefacts to support reproducibility and community benchmarking. Further external validation is also ongoing through collaborations with partner institutions worldwide.