Bioengineering (Basel). 2025 Aug 03. pii: 840. [Epub ahead of print]12(8):
The retina offers a unique window into both ocular and systemic health, motivating the development of AI-based tools for disease screening and risk assessment. In this study, we present a comprehensive evaluation of six state-of-the-art deep neural networks, including convolutional neural networks and vision transformer architectures, on the Brazilian Multilabel Ophthalmological Dataset (BRSET), comprising 16,266 fundus images annotated for multiple clinical and demographic labels. We explored seven classification tasks: Diabetes, Diabetic Retinopathy (2-class), Diabetic Retinopathy (3-class), Hypertension, Hypertensive Retinopathy, Drusen, and Sex classification. Models were evaluated using precision, recall, F1-score, accuracy, and AUC. Among all models, the Swin-L generally delivered the best performance across scenarios for Diabetes (AUC = 0.88, weighted F1-score = 0.86), Diabetic Retinopathy (2-class) (AUC = 0.98, weighted F1-score = 0.95), Diabetic Retinopathy (3-class) (macro AUC = 0.98, weighted F1-score = 0.95), Hypertension (AUC = 0.85, weighted F1-score = 0.79), Hypertensive Retinopathy (AUC = 0.81, weighted F1-score = 0.97), Drusen detection (AUC = 0.93, weighted F1-score = 0.90), and Sex classification (AUC = 0.87, weighted F1-score = 0.80). These results reflect excellent to outstanding diagnostic performance. We also employed gradient-based saliency maps to enhance explainability and visualize decision-relevant retinal features. Our findings underscore the potential of deep learning, particularly vision transformer models, to deliver accurate, interpretable, and clinically meaningful screening tools for retinal and systemic disease detection.
Keywords: convolutional neural networks; deep learning; explainable AI; fundus images; retinal disease; vision transformers