bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2026–05–10
23 papers selected by
Mott Given



  1. Am J Ophthalmol. 2026 May 02. pii: S0002-9394(26)00227-8. [Epub ahead of print]
       PURPOSE: To benchmark multiple automated machine learning (AutoML) platforms for diabetic retinopathy (DR) screening from fundus photographs using a unified training and evaluationframework, with human consensus grading and an FDA-approved autonomous system (IDx-DR) as reference standards.
    DESIGN: Retrospective, diagnostic performance and benchmarking study METHODS: Image classifiers were trained on large public datasets labeled according to the International Clinical Diabetic Retinopathy (ICDR) scale (APTOS, n = 5,590; DDR, n = 12,524; EyePACS, n = 31,557) after automated image-quality filtering. Performance was evaluated on an independent, institutionally collected patient-level test cohort (n = 726) using the highest DR grade across all images for patient-level classification. The evaluated platforms included Google Vertex AI, Amazon Rekognition, Amazon SageMaker Canvas, AutoGluon, AutoKeras, and Apple CreateML. Models were assessed for three screening endpoints-any DR, referable DR (RDR), and sight-threatening DR (STDR)-across probability thresholds of 20%, 50%, and 70%. The primary endpoints were the Area Under the Receiver Operating Characteristic Curve (AUC) and sensitivity for RDR at a 50% decision threshold. Secondary endpoints included specificity, positive predictive value, negative predictive value, accuracy, and F1-score with 95% confidence intervals. Pairwise comparisons were performed using bootstrap testing (n=1,000) for AUC differences and McNemar's test (at a 50% threshold) for binary outcomes, both subjects to Bonferroni correction (p<0.0033). Grad-CAM was applied to locally deployable convolutional neural network-based models.
    RESULTS: Using human consensus grading as the reference standard, Amazon SageMaker Canvas and AutoGluon demonstrated the strongest overall discrimination, achieving AUC values up to0.96 for STDR and 0.93-0.94 for RDR. At the 50% decision threshold, Canvas showed the most balanced performance for RDR (sensitivity 88.3%, specificity 85.5%, accuracy 86.0%), whereasAutoGluon favored sensitivity (any DR sensitivity 95.9%) at the expense of specificity. Vertex AI showed consistently weaker and unstable performance (any DR AUC 0.58; RDR AUC 0.38). Relative to IDx-DR, Amazon Rekognition and Canvas showed the highest agreement, particularly for STDR (AUC up to 0.88-0.90; κ up to ∼0.56). Agreement with human graders was generallylow to moderate (κ ≈ 0.3-0.6) and increased at higher probability thresholds.
    CONCLUSIONS: AutoML platforms can achieve clinically meaningful performance for DR screening. Differences across tools and thresholds reflect their adaptability to diverse clinical settings, underscoring the importance of external validation and threshold calibration.
    Keywords:  automated machine learning; diabetic retinopathy; image analysis; retinal fundus photograph
    DOI:  https://doi.org/10.1016/j.ajo.2026.04.030
  2. Front Digit Health. 2026 ;8 1768780
      Type 2 diabetes mellitus (T2DM) is associated with multi-organ complications, including cardiovascular and renal disease. Fundus photography provides a non-invasive window into systemic microvascular health, and artificial intelligence (AI) has enabled extraction of retinal biomarkers for systemic risk prediction beyond diabetic retinopathy detection. We conducted a methodologically structured scoping review following PRISMA-ScR guidance to map AI applications in retinal imaging for multi-organ risk stratification in T2DM. Studies using machine learning or deep learning models to predict cardiovascular, renal, or cerebrovascular outcomes were identified and characterized. Rather than quantitative pooling, we examined modeling strategies, validation approaches, performance reporting, and translational readiness across heterogeneous study designs. AI models frequently demonstrated promising discrimination; however, substantial heterogeneity was observed in cohort size, outcome definitions, imaging modalities, and validation strategies. External validation was limited, calibration was inconsistently assessed, and subgroup analyses addressing fairness and device-related domain shift were rarely reported. Most studies emphasized discrimination metrics without comprehensive evaluation of clinical utility.Retinal AI shows potential for scalable systemic risk surveillance in T2DM, but rigorous external validation, standardized reporting, and prospective implementation studies are required to enable safe and equitable clinical translation.
    Keywords:  artificial intelligence; biomarkers; clinical translation; deep learning; diabetic retinopathy; multi-organ complications; personalized medicine; retinal fundus imaging
    DOI:  https://doi.org/10.3389/fdgth.2026.1768780
  3. Exp Eye Res. 2026 May 05. pii: S0014-4835(26)00202-2. [Epub ahead of print]268 111046
       PURPOSE: Diabetic retinopathy (DR) is a leading cause of vision impairment worldwide. Optical coherence tomography (OCT) and OCT angiography (OCTA) provide detailed retinal imaging, enabling early detection of microvascular changes. This study aims to systematically review artificial intelligence (AI), particularly deep learning (DL), applications for DR detection and analysis using OCT and OCTA images.
    METHODS: A comprehensive literature search was conducted across PubMed, Web of Science, Scopus, IEEE Xplore, and Embase for studies published up to March 2026. A total of 1007 articles were identified, of which 175 studies met the inclusion criteria following the PRISMA study selection process.
    RESULTS: DL-based approaches consistently demonstrated superior performance compared to traditional machine learning (ML) methods, with reported AUC values typically ranging from 0.90 to 0.99 across classification and segmentation tasks. Convolutional neural networks (CNNs), Vision Transformers (ViTs), and encoder-decoder architectures such as U-Net showed strong performance in detecting key DR biomarkers, including microaneurysms, macular edema, and neovascularization. However, performance variability was observed depending on dataset size, imaging modality, and annotation quality.
    CONCLUSIONS: AI-driven analysis of OCT and OCTA images offers significant potential for automated DR detection. Despite promising results, challenges such as limited public datasets, lack of cross-institutional validation, and model interpretability remain. Future research should focus on multimodal integration, explainable AI, and large-scale validation to enhance clinical applicability.
    DOI:  https://doi.org/10.1016/j.exer.2026.111046
  4. Bioengineering (Basel). 2026 Apr 21. pii: 480. [Epub ahead of print]13(4):
      Diabetes mellitus is a health issue that is rapidly increasing worldwide, and it affects more than 347 million people globally. It is important to note that the disease can be successfully detected in its early stages, enabling physicians to avoid complications and improve patient outcomes. Despite the fact that machine learning (ML) has been extensively used in diabetes classification, the available solutions tend to place little or no emphasis on feature selection and ensembles, which limits prediction accuracy and generalizability. In this study, we introduce a hybrid framework that is based on three feature-selection algorithms, specifically, genetic algorithm (GA), correlation-based feature selection (CFS) and recursive feature elimination (RFE), in single and hybrid forms, and three classifiers, namely, multi-layer perceptron (MLP), support vector machine (SVM) and random forest (RF), to achieve a greater predictive robustness with the aid of soft voting. Experimental findings obtained from a benchmark diabetes dataset indicate that the RFE + CFS + SVM combination achieves the best performance, with an accuracy of 98.0%, sensitivity of 97.43%, specificity of 99.03%, precision of 99.51% and F1-score of 98.72%. These results indicate that the suggested hybrid feature-selection and ensemble learning model can offer a robust and highly effective approach for early-stage diabetes diagnosis, one which clinicians may use to make timely and accurate decisions.
    Keywords:  PIAM and Frankfurt; classification; diabetic prediction; feature selection
    DOI:  https://doi.org/10.3390/bioengineering13040480
  5. Bioengineering (Basel). 2026 Apr 12. pii: 450. [Epub ahead of print]13(4):
      Diabetic retinopathy (DR) remains a major cause of vision loss in patients with diabetes, and earlier recognition of retinal vascular abnormalities may improve risk stratification and clinical follow-up. Optical coherence tomography angiography (OCTA) provides a noninvasive way to visualize the retinal microvasculature and may detect DR-related changes before they are evident on routine clinical assessment. In this work, we investigated whether dividing OCTA images into anatomically defined retinal regions could improve DR classification and clarify which regions carry the greatest discriminative information. The study included 188 OCTA images: 67 from normal eyes, 57 from eyes with mild DR, and 64 from eyes with moderate DR. Each image was divided into seven concentric regions centered on the fovea, and vessel-density features were extracted from each region. Ten machine learning classifiers were trained and compared at the regional level. For each region, the best-performing classifier was retained, and the final prediction was obtained with a majority-voting ensemble. To examine model behavior, Local Interpretable Model-Agnostic Explanations (LIME) were applied. Performance was also compared with that of a transfer-learning MobileNet model trained on whole OCTA images. On the held-out patient-level test set, the ensemble model achieved 97% accuracy, 98% precision, 97% recall, and a 97% F1-score for three-class classification. These results were higher than those obtained with the tested whole-image transfer-learning baselines. The interpretability analysis consistently identified the parafoveal regions as the most informative for classification. Among the seven regions, Region 3 showed the highest overall contribution, followed by Regions 2 and 5, whereas Region 5 became more influential in moderate DR. These results suggest that regional analysis of OCTA-derived vessel density can improve both classification performance and interpretability in DR assessment. The findings also indicate that parafoveal vascular alterations carry substantial discriminative value in distinguishing normal, mild DR, and moderate DR cases. Validation in larger, independent cohorts from multiple centers will be necessary to confirm the generalizability of these findings.
    Keywords:  Local Interpretable Model-Agnostic Explanations (LIME); diabetic retinopathy; ensemble models; explainable AI; optical coherence tomography angiography (OCTA); regional feature extraction; retinal microvasculature
    DOI:  https://doi.org/10.3390/bioengineering13040450
  6. Sci Rep. 2026 May 05.
      Diabetic retinopathy (DR) diagnosis from digital fundus images is a long-standing topic of research in medical image processing. The determination of optic disk boundaries in two-dimensional retinal images is difficult due to blurred edges, which makes this field in need of improvement. All these problems cannot be solved by a single technique. An efficient algorithm for identifying DR-related retinal changes and structure is still needed. If DR is recognized and treated in a timely manner, visual deterioration can be managed or avoided. It is based on telemedicine analysis of color fundus pictures or clinical evaluations by medical doctors. However, due to intrinsic human subjectivity, both systems are time-consuming, labor-intensive, and prone to inaccuracy. Due to their great specificity and sensitivity, automated methods capable of analyzing color fundus pictures have become important for the general deployment of DR screening. To study the existence of DR-related characteristics and to cope with the various diabetes severity diagnosis phases, a hybrid quantum convolutional neural network (HQCNN) is presented. Kaggle fundus images database is utilized to test and train the network. Finally, the presented work is compared for analyzing efficiency using the system of measurement like precision, specificity, accuracy, sensitivity, and f1 score. The proposed work obtains accuracy of 98.89%, sensitivity of 99.37%, specificity of 99.57%, precision of 98.89%, and F1 score of 97.58%.
    Keywords:  CNN; Diabetic retinopathy; Fundus images; Image processing; Quantum computing; Retina
    DOI:  https://doi.org/10.1038/s41598-026-49227-2
  7. Ophthalmol Sci. 2026 Jun;6(6): 101132
       Purpose: To evaluate the performance of a deep learning (DL) model in classifying diabetic retinopathy (DR) severity using fundus images with varying fields of view and to assess whether central retinal features alone reflect overall disease burden. The study also investigates vascular biomarkers within regions contributing to model decisions to explore biological basis of inferences and enhance clinical interpretability.
    Design: An observational, cross-sectional study.
    Participants: A total of 2610 participants aged ≥40 years from a population-based study in South India, with ocular and systemic data, including dilated fundus images and glycated hemoglobin levels.
    Methods: Diabetic retinopathy severity was graded on unmasked 200° ultra-widefield (UWF) and centrally masked 45° images, with peripheral lesions annotated in the 155° field (200°-45°). Convolutional neural network was trained on labeled images and evaluated on both datasets. Performance metrics and receiver operating characteristic (ROC) curves were calculated. Gradient-weighted class activation mapping (Grad-CAM) identified model focus. Vascular biomarkers (tortuosity, fractal dimension, and vessel density) were quantified using LWNet and Fiji software and compared between eyes with and without peripheral lesions.
    Main Outcome Measures: Classification accuracy of DR severity, precision and recall, ROC curves, Grad-CAM visualization of regions of model focus, and vascular biomarkers associated with peripheral lesions.
    Results: The DL model achieved high classification accuracy: 97.12% for UWF images, 97.24% for 45° masked images with re-evaluated labels, and 96.86% for masked images with original labels. Peripheral pathology, including microaneurysms (19.8%) and hemorrhages/exudates (9.1%), did not affect accuracy. Deep learning-based assessment of 45° fundus images reduced DR underestimation from 16.5% to 6.2%. Gradient-weighted class activation mapping highlighted focus on central regions, including areas without visible lesions. Vascular analysis revealed differences in vessel density and tortuosity between eyes with and without peripheral microvascular abnormalities and neovascularization, suggesting detection of subclinical vascular changes linked to peripheral disease.
    Conclusions: Deep learning applied to 45° fundus images can accurately classify DR and detect subtle vascular biomarkers predictive of peripheral disease. This proof-of-concept highlights the potential of artificial intelligence (AI)-enhanced 45° imaging as a scalable tool for DR screening. Such AI-powered approaches, using accessible and affordable fundus cameras, may enable cost-effective detection and triage of high-risk cases in primary care and resource-limited settings.
    Financial Disclosures: The authors have no proprietary or commercial interest in any materials discussed in this article.
    Keywords:  Deep learning; Diabetic retinopathy; Retinal vascular biomarkers; Ultra-widefield imaging
    DOI:  https://doi.org/10.1016/j.xops.2026.101132
  8. Sensors (Basel). 2026 Apr 18. pii: 2510. [Epub ahead of print]26(8):
      Diabetic retinopathy represents one of the leading causes of blindness worldwide, making early diagnosis essential for effective clinical intervention. We propose an explainable method aimed at automatically identifying the severity levels of diabetic retinopathy in retinal images using deep learning. The proposed method considers several convolutional neural network architectures, i.e., VGG16, StandardCNN, ResNet, CustomCNN, EfficientNet, MobileNet, and a novel architecture, i.e., FGNet, specifically designed and developed by the authors for diabetic retinopathy detection. The proposed network achieves an accuracy of 0.75 when trained for 10 epochs and 0.71 for 20 epochs. Explainability behind model prediction is further supported through Gradient-weighted Class Activation Mapping, providing visual insight into the learned decision-making process and potentially supporting early clinical assessment.
    Keywords:  artificial intelligence; classification; convolutional neural network; deep learning; diabetic retinopathy
    DOI:  https://doi.org/10.3390/s26082510
  9. Am J Med Open. 2026 Jun;15 100132
       Aims: To evaluate county-level incidence of diagnosed diabetes and key sociodemographic factors in a high-dimensional, nonlinear setting.
    Methods: This temporally aggregated observational study used US Centers for Disease Control and Prevention data on county-level incidence of diagnosed diabetes, from 2004 to 2019, and 34 sociodemographic factors from public databases. We defined counties as higher-burden if diabetes incidence was >12.6 per 1000 persons (1 standard deviation [SD] above sample mean). As relationships between sociodemographic factors and diabetes incidence may be nonlinear and involve complex interactions, we trained three machine learning models to estimate incidence (elastic net regression), classify counties as higher-burden (eXtreme Gradient Boosting [XGBoost], support vector machine [SVM]), and identify feature importance. Model performance was evaluated using fivefold cross-validation, with stratified folds for XGBoost and SVM models.
    Results: Overall, 500 of 3114 counties (16.1%) were of higher-burden. Elastic net regression showed good predictive performance for estimating diabetes incidence (R 2 0.78 [95% CI, 0.75-0.80]). For classification of higher-burden counties, SVM and XGBoost showed high discrimination with AUROC of 0.962 (95% CI, 0.948-0.974) and 0.957 (95% CI, 0.941-0.971), respectively. Sensitivity analyses using alternative definitions of higher-burden counties (mean + 0.75 × SD; mean + 1.25 × SD) yielded comparable results. Across all three models, key county-level features contributing to model predictions were percentages of children living with grandparent householders and of people withLimited English.
    Conclusions: Machine learning models demonstrated consistent performance in estimating and classifying county-level diabetes incidence, with high discrimination for identifying higher-burden counties. Sociodemographic factors, including children living with grandparent householders, may inform tailored public health interventions.
    Keywords:  Diabetes; Machine learning; Social determinants of health; Social vulnerability index
    DOI:  https://doi.org/10.1016/j.ajmo.2026.100132
  10. Front Med (Lausanne). 2026 ;13 1778534
       Background: Diabetic retinopathy (DR) causes severe vision impairment that requires early screening methods for effective detection. The combination of Artificial Intelligence (AI) and tele-ophthalmology technology provides an effective solution that enhances both DR detection rates and patient access to care.
    Objective: The study aims to assess how well AI-based telemedicine smartphone conventional and population-based screening methods detect diabetic retinopathy in terms of effectiveness, accuracy, and real-world performance.
    Methodology: Researchers executed a comprehensive literature search across PubMed, Scopus, Web of Science, and Embase to find articles published between 2012 and 2025. The study examined three types of studies: AI-based DR screening, tele-ophthalmology, and evaluations of standard diagnostic methods. The analysis used random-effects meta-analysis to estimate pooled odds ratios (ORs), assess heterogeneity using I2, and test for publication bias with funnel plots and Egger's test.
    Results: The study included 45 different research studies. The standalone deep learning/AI tools (6 studies) demonstrated a pooled OR of 5.79 (95% CI: 5.22-6.42; p < 0.05), with low heterogeneity and no evidence of publication bias. The automated AI systems processed human grading data from 13 studies, yielding an OR of 5.48 (95% CI: 5.09-5.90; p < 0.05), indicating consistent effect sizes. The smartphone-based AI screening system showed an OR of 4.73 (95% CI: 3.96-5.66; p < 0.05), indicating moderate heterogeneity (I2 = 54%). The tele-ophthalmology/remote screening system produced 11 studies reporting an OR of 4.91, whereas conventional physician screening produced 3 studies reporting an OR of 4.96, yielding consistent results with low heterogeneity. The population-based/community screening system produced 5 studies that demonstrated an OR of 4.90 (95% CI: 4.33-5.54; p < 0.05) and exhibited some signs of publication bias according to Egger's test (p = 0.019). All methods achieved statistically significant progress in DR detection research.
    Conclusion: AI-based screening, including deep learning algorithms, automated grading, and tele-ophthalmology, shows high diagnostic accuracy and consistent effectiveness. The implementation of smartphone- and population-based approaches is feasible, yet their outcomes vary, necessitating validation through context-specific assessments.
    Keywords:  artificial intelligence; deep learning; diabetic retinopathy; meta-analysis; screening effectiveness; smartphone screening; tele-ophthalmology
    DOI:  https://doi.org/10.3389/fmed.2026.1778534
  11. Front Endocrinol (Lausanne). 2026 ;17 1816599
       Background: In-hospital hypoglycemia remains a serious and potentially life-threatening complication among adults with type 1 diabetes mellitus (T1DM), yet reliable and interpretable prediction tools for Chinese inpatients are lacking. We aimed to develop and validate an interpretable machine learning model using multicenter inpatient data to predict the risk of in-hospital hypoglycemia in adults with T1DM, and to enhance clinical understanding of key predictors.
    Methods: This multicenter retrospective cohort study enrolled adult inpatients with T1DM from five tertiary Grade A hospitals in China between January 1, 2019 and September 30, 2025. From the same multicenter cohort, the total dataset was randomly split 7:3 into a development set (n = 1,048) and an independent external validation set (n = 450). Within the development set, we performed 5-fold stratified cross-validation for hyperparameter tuning, and both internal cross-validation and external validation remained fully independent throughout model development. Machine learning models were trained to predict in-hospital hypoglycemia and evaluated for discrimination, calibration, clinical utility, and interpretability.
    Results: The study enrolled 1,498 patients, of whom 580 (38.7%) experienced in-hospital hypoglycemia. The random forest model demonstrated superior predictive performance in the external validation cohort, achieving an AUC of 0.831 (95% CI: 0.798-0.873), sensitivity of 0.793, specificity of 0.748, and a Brier score of 0.149. Hemoglobin, potassium, sodium, low-density lipoprotein cholesterol, and age at onset were identified as the top predictors. Hemoglobin, potassium, sodium, and BMI exhibited U-shaped associations with hypoglycemia risk, where both low and high values increased risk. Exploratory analysis of joint biomarker status showed that patients with abnormalities in two or more of these core predictors had a non-significant trend toward higher event rates, while the complexity of their combined effects was better captured by the non-linear model. The model enabled effective risk stratification into four quartiles, and decision curve analysis confirmed its consistent net clinical benefit across relevant probability thresholds.
    Conclusions: The interpretable random forest model using routine inpatient data showed strong discrimination, good calibration and useful risk stratification for in-hospital hypoglycemia in Chinese adults with T1DM, which may help identify high-risk patients early and guide targeted preventive interventions in clinical practice.
    Keywords:  hypoglycemia; machine learning; multicenter; predictive model; type 1 diabetes mellitus
    DOI:  https://doi.org/10.3389/fendo.2026.1816599
  12. Bioengineering (Basel). 2026 Mar 25. pii: 377. [Epub ahead of print]13(4):
      To perform screening of the retina on a population scale, an automated procedure is required that incorporates accurate, reproducible, interpretable, and computationally costeffective models. Existing approaches using convolutional or transformer architectures typically do not adequately represent both fine-grained pathology and large-scale retinal context simultaneously, which could adversely affect their reliability if used for large-scale applications in clinical practice. In this paper, we propose a hierarchical transformer-based screening framework for retinal fundus images that incorporates patch-based tokenization, global transformer encoding, and hierarchical aggregation of contextual information. We also developed a lightweight prediction head that supports screening for both single and multiple diseases. The framework has been evaluated using standard screening metrics, robustness, and cross-dataset generalization analyses on two eye retinopathy image databases: EyePACS and RFMiD. With regard to screening for a binary outcome of diabetic retinopathy, our method provided an accuracy of 89.4% and an area under the receiver operating characteristic (AUROC) curve of 93.6% on EyePACS and attained an accuracy of 95.2% and a macro-averaged F1 score of 82.7% on RFMiD. Our hierarchical transformer achieved improved robustness to degraded images and increased generalizability across datasets compared with all current state-of-the-art models. The proposed hierarchical transformer demonstrates strong potential for large-scale retinal screening and provides a promising foundation for future clinically validated deployment.
    Keywords:  diabetic retinopathy; medical image analysis; multi-disease classification; population-scale healthcare; retinal image screening; vision transformers
    DOI:  https://doi.org/10.3390/bioengineering13040377
  13. Front Med (Lausanne). 2026 ;13 1815982
       Background: Pretrained foundation models are increasingly adopted for diabetic retinopathy (DR) screening, yet it remains unclear how much of their performance derives from the learned representations versus the adaptation procedure. Most benchmarks report discrimination metrics alone, neglecting probability calibration.
    Methods: We compared the frozen representations of three pretrained encoders: MedSigLIP (medical vision-language; ViT-B/16, 448 × 448), RETFound (retinal self-supervised; ViT-L/16, 224 × 224), and EfficientNet-B0 (ImageNet-supervised; 224 × 224). All encoder weights were frozen; only an identical lightweight multilayer perceptron head was trained. Models were developed on APTOS 2019 (3,662 fundus images; five-fold cross-validation) and externally validated on MESSIDOR-2 (1,744 images). Binary referable DR detection and five-class severity grading were evaluated. AUC, expected calibration error (ECE), and Brier score served as co-primary endpoints. External-set tests used patient-level cluster-robust bootstrap to account for bilateral correlation.
    Results: On the development set, all three encoders achieved near-identical binary AUC (0.980-0.985). MedSigLIP showed superior calibration, with a lower Brier score than RETFound (0.044 vs. 0.049; p = 0.030) and EfficientNet-B0 (0.044 vs. 0.052; p = 0.006). External validation on MESSIDOR-2 revealed divergence: MedSigLIP maintained an AUC of 0.915 (drop 0.070), whereas RETFound fell to 0.697 (drop 0.286) and EfficientNet-B0 to 0.745 (drop 0.236). Retina-specific RETFound performed below the ImageNet baseline (ΔAUC = -0.051; p = 0.016, cluster-robust bootstrap). For five-class grading, MedSigLIP achieved an external macro-F1 of 0.450 versus 0.247 (RETFound) and 0.291 (EfficientNet-B0). Temperature scaling reduced development ECE to 0.014-0.022 but proved ineffective under domain shift (external ECE 0.086-0.149). All encoders exhibited catastrophic failure on mild DR (grade 1) externally, with RETFound and EfficientNet-B0 achieving F1 = 0.000 and MedSigLIP only 0.153.
    Conclusion: Under frozen transfer, the MedSigLIP encoder package produced more generalisable and better calibrated representations than both retinal self-supervised (RETFound) and ImageNet-supervised (EfficientNet-B0) encoders. Domain-specific pretraining did not guarantee domain-general frozen representations. These findings demonstrate that development-set discrimination alone is insufficient for encoder evaluation and that calibration metrics-particularly the Brier score-should be reported as standard practice.
    Keywords:  calibration; diabetic retinopathy; domain shift; external validation; foundation models; frozen encoder; representation transfer; temperature scaling
    DOI:  https://doi.org/10.3389/fmed.2026.1815982
  14. J Diabetes Sci Technol. 2026 May 08. 19322968261441312
       BACKGROUND: Hemoglobin A1C (HbA1C) is the gold standard for assessing long-term glycemic control in people with diabetes. Increasing use of continuous glucose monitoring (CGM) has led to adoption of the glucose management indicator (GMI) as a CGM‑based HbA1C estimate, but GMI often differs from laboratory HbA1C, especially in type 2 diabetes. This discordance may be associated with the fact that GMI, as a measure of central tendency, fails to capture temporal glycemic trends and variability that relate to HbA1C formation.
    OBJECTIVE: To evaluate whether combining CGM-derived metrics capturing variability, excursions, and temporal trends improves estimation of laboratory-measured HbA1C in type 2 diabetes.
    METHODS: A machine learning framework was applied to CGM data from a three-month randomized trial, including 159 participants with type 2 diabetes. Participants had ≥70% CGM data coverage and valid end-of-trial HbA1C. From a standardized 90-day CGM window, 51 metrics were extracted. Benchmark models (mean glucose and GMI) were compared with models developed using forward and exhaustive feature selection with threefold cross-validated multiple linear regression.
    RESULTS: Benchmark models yielded R-squared = 0.53. A forward selection model including five metrics (GMI at night, night-to-overall mean glucose ratio, glycemic risk assessment diabetes equation, time in tight range [3.0-7.8 mmol/L], time above range [13.9 mmol/L] at night) improved R-squared to 0.60. The best-performing model (substituting GRADE at night for GMI at night) achieved a similar R-squared (0.61). Nighttime and hyperglycemia‑related metrics were consistently selected.
    CONCLUSION: Continuous glucose monitoring‑based HbA1C estimation improves when variability and temporal patterns are included. Nighttime hyperglycemia adds notable predictive value, though further validation is needed.
    Keywords:  CGM metrics; HbA1C estimation; continuous glucose monitoring; glucose management indicator; machine learning; trend-based glucose modeling; type 2 diabetes
    DOI:  https://doi.org/10.1177/19322968261441312
  15. Bioengineering (Basel). 2026 Mar 24. pii: 374. [Epub ahead of print]13(4):
      Diabetic retinopathy (DR) is the largest cause of permanent vision loss in the working-age population, making automated grading critical for timely therapeutic intervention. While recent deep learning algorithms have improved feature discrimination, modern state-of-the-art systems have two fundamental drawbacks. First, most models rely on standard Convolutional Neural Networks, which struggle to capture long-range relationships and lack semantic reasoning, resulting in visual findings that do not correlate with clinical knowledge. Second, present approaches often consider grading as a nominal classification or a pure ordinal regression task, failing to strike a compromise between high classification accuracy and severity-consistent predictions (Quadratic Weighted Kappa). To address these challenges, we propose Dual-SwinOrd, a novel framework that integrates a hierarchical Vision Transformer with a semantically guided dual-head mechanism. Specifically, we use a Swin Transformer backbone to extract hierarchical features, effectively capturing global retinal structures. To handle diverse lesion scales, we incorporate a Progressive Lesion-aware Kernel Attention (PLKA) module and a Semantic Prior Modulation (SPM) module guided by PubMedCLIP, bridging the gap between visual features and medical linguistic priors. In addition, we propose a Dual-Head learning strategy that decouples the optimization objective into two parallel streams: a Classification Head to maximize diagnostic accuracy and an Ordinal Regression Head (DPE) to enforce rank-consistency. This design effectively mitigates the trade-off between precision and ordinality. Extensive experiments on the APTOS 2019 and DDR datasets demonstrate that Dual-SwinOrd achieves state-of-the-art performance, yielding an Accuracy of 87.98% and a Quadratic Weighted Kappa (QWK) of 0.9370 on the APTOS 2019 dataset, as well as an Accuracy of 86.54% and a QWK of 0.9040 on the DDR dataset.
    Keywords:  Diabetic Retinopathy; Dual-Head Ordinal Regression; Lesion-aware Attention; PubMedCLIP; Swin Transformer; Vision–Language Priors
    DOI:  https://doi.org/10.3390/bioengineering13040374
  16. Psychogeriatrics. 2026 May;26(3): e70176
       AIMS: This study aimed to identify factors associated with cognitive frailty (CF) in older adults with type 2 diabetes mellitus (T2DM) and to develop a machine learning-based risk prediction model.
    METHODS: Between December 2023 and December 2024, 349 participants were recruited through convenience sampling from the Department of Endocrinology, First Affiliated Hospital of Guangxi Medical University. The participants were randomly divided into a training set (n = 244) and a test set (n = 105) at a ratio of 7:3. Participants completed a structured questionnaire and were classified into CF and non-CF groups. Univariate and binary logistic regression analyses identified significant predictors, which were used as input features for six Machine Learning (ML) algorithms. Shapley additive explanations (SHAP) ranked feature importance and provided interpretability.
    RESULTS: Of 349 older adult patients with T2DM, 87 (23.5%) had CF. Six significant predictors were identified: advanced age, lower educational attainment, insufficient physical activity, depression, malnutrition and a higher number of chronic diabetes-related complications. All models achieved satisfactory performance (AUC > 0.750). The Support Vector Machine (SVM) model performed best (AUC 0.836, accuracy 0.759, precision 0.495, recall 0.699, F1-score 0.575). A web-based application (https://webpredict1.streamlit.app/) was developed from the SVM model to enable individualised CF risk estimation.
    CONCLUSION: ML models effectively identified CF in older adult patients with T2DM, with the SVM model achieving the highest accuracy. Addressing the identified risk factors may help reduce CF risk and improve outcomes in this population.
    IMPACT: This study provides nurses with a risk prediction tool for identifying older adults with T2DM who are at high risk of CF and may facilitate the development of effective interventions for CF risk management.
    IMPLICATIONS FOR THE PROFESSION AND/OR PATIENT CARE: High-risk older adults with T2DM can be identified early through this model, enabling nurses to implement tailored interventions that may reduce CF and improve outcomes.
    REPORTING METHOD: The study has adhered to STROBE guidelines.
    PATIENT OR PUBLIC CONTRIBUTION: No patient or public contribution.
    Keywords:  cognitive frailty; machine learning; older adults; risk prediction; type 2 diabetes mellitus
    DOI:  https://doi.org/10.1111/psyg.70176
  17. medRxiv. 2026 Apr 22. pii: 2026.04.21.26351384. [Epub ahead of print]
       Background: Diabetic kidney disease (DKD) is a leading cause of kidney failure in individuals with type 2 diabetes (T2D), yet risk identification in routine clinical practice remains incomplete. A critical and often overlooked barrier is risk observability: how much of a patient's underlying risk is actually captured in their clinical record at the time of screening. Existing prediction models evaluate performance using model-specific thresholds, making it difficult to understand how additional data sources alter real-world screening behavior or which individuals benefit when models are expanded.
    Methods: We developed a series of five nested machine learning models evaluated at a one-year landmark following T2D diagnosis using data from the All of Us Research Program (N = 39,431; cases = 16,193). Each successive model added a distinct information layer -- intrinsic risk, laboratory snapshots, medication exposure, longitudinal care trajectories, and social determinants of health (SDOH) -- while retaining all prior features. All models were evaluated under a fixed screening policy targeting 90% specificity, so that the false positive rate remained constant as the information available to the model grew. External validation was conducted in the Bio Me Biobank (N = 9,818) without retraining.
    Results: Discrimination improved consistently across layers, from AUROC 0.673 (M1) to 0.797 (M5). Under the fixed screening policy, sensitivity nearly doubled from 0.27 to 0.49, with a cumulative recovery of 30.4% of cases missed by the base model. Gains were driven by distinct subgroups at each transition: laboratory features identified biologically high-risk individuals; medication features captured those with high treatment intensity reflecting advanced cardiometabolic burden; longitudinal care trajectory features rescued cases with biological instability observable only through repeated measurements; and SDOH features recovered individuals with limited clinical observability, with rescue probability highest among those with the fewest recorded monitoring domains. Sparse data in the clinical record indicated low observability, not low risk. Social and genetic features each contributed most when downstream physiologic signal was limited, supporting a contextual rather than universal role for each. In Bio Me , discrimination was attenuated (M4 AUROC 0.659), but the relative ordering of information layers was fully preserved, and a systematic upward shift in predicted probability distributions underscored the need for recalibration before deployment in a new setting.
    Conclusions: DKD risk detection in T2D is substantially improved by integrating complementary information layers under a fixed clinical screening policy, with gains arising from distinct domains that identify at-risk individuals in different clinical contexts. The layered landmark framework introduced here reveals how risk observability -- shaped by monitoring intensity, healthcare engagement, and access -- determines what a screening model can detect, and provides a foundation for context-aware EHR-based screening that accounts for data availability at the time of risk assessment.
    Abstract Figure:
    DOI:  https://doi.org/10.64898/2026.04.21.26351384
  18. J Proteomics. 2026 Apr 30. pii: S1874-3919(26)00066-7. [Epub ahead of print]329 105663
      Pancreatic ductal adenocarcinoma (PDAC) is frequently preceded by new-onset diabetes mellitus (NODM), yet differentiating PDAC-associated DM from type 2 diabetes (T2D) remains clinically challenging. We investigated whether plasma proteomic profiling combined with machine learning could discriminate these conditions. Plasma samples from individuals with PDAC (with and without DM), long-standing T2D, and controls were analyzed by MALDI-TOF mass spectrometry. Spectral features were processed through a nested cross-validation framework to prevent data leakage, and model interpretability was explored using SHAP values. In parallel, low-molecular-weight proteins were characterized by GeLC-MS followed by LC-MS/MS and differential abundance analysis. Machine learning models distinguished PDAC-associated DM from T2D with a balanced accuracy of 85%. Proteomic analyses identified distinct signatures in PDAC- associated DM, including downregulation of erythrocyte-related proteins and PPBP, and upregulation of acute-phase reactants such as FGA, CP, and SERPINA3. Treatment-naïve cases displayed increased circulating epithelial and keratin-associated proteins, which were attenuated after therapy, suggesting dynamic tumor-related remodeling. These findings demonstrate that integrating MALDI-TOF profiling with machine learning can capture plasma signatures associated with PDAC-associated DM. Although exploratory, this approach supports further validation in prospective cohorts aimed at improving PDAC risk stratification among individuals with NODM. SIGNIFICANCE: Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy with a dismal 5-year survival rate, primarily due to late-stage diagnosis. The frequent occurrence of new-onset diabetes mellitus (NODM) as a paraneoplastic syndrome offers a critical window for early detection. However, the clinical challenge of distinguishing PDAC-associated diabetes (PDAC-DM) from type 2 diabetes mellitus (T2D) has hindered the implementation of effective screening strategies. This study addresses this significant clinical problem by leveraging a multi-faceted proteomics approach. We demonstrate that the integration of MALDI-TOF mass spectrometry peptide profiling with machine learning algorithms can accurately discriminate PDAC-DM from T2D with 85% accuracy. Furthermore, we used LC-MS/MS to identify specific low molecular weight proteins that are differentially regulated between these conditions, providing a molecular basis for the observed discrimination. Our work is significant as it presents a novel, high-throughput pipeline for biomarker discovery that combines the scalability of MALDI-TOF with the analytical power of LC-MS/MS and machine learning. The identified plasma signatures hold strong translational potential to improve risk stratification in patients with NODM, ultimately enabling earlier diagnosis of PDAC and improving patient survival prospects. This research directly contributes to the field of clinical proteomics by providing a robust methodological framework and candidate biomarkers for the early detection of one of oncology's most challenging diseases.
    Keywords:  MALDI-TOF MS; Machine learning; Pancreatic ductal adenocarcinoma; Plasma proteomics; Type 2 diabetes mellitus
    DOI:  https://doi.org/10.1016/j.jprot.2026.105663
  19. Front Endocrinol (Lausanne). 2026 ;17 1784699
       Objective: To develop a machine learning-based classification model to aid in the early diagnosis of diabetic microvascular complications.
    Methods: This study analyzed clinical and laboratory data from 1,498 patients, categorized into two groups: diabetes alone and diabetes with microvascular complications. Independent risk factors for complications were identified through intergroup comparison, collinearity analysis, and logistic regression. Nine machine learning models were subsequently developed and compared. A comprehensive evaluation of the binary classification performance of the Gradient Boosting Decision Tree (GBDT) model was performed.
    Results: Urea, fibrinogen (FIB), prothrombin time (PT), D-dimer (DD), creatine kinase MB isoenzyme (CKMB), lipoprotein(a) (Lpa), activated partial thromboplastin time (APTT), triglycerides (TG), and cholinesterase (CHE) were identified as independent risk factors for diabetic microvascular complications. Among the nine predictive models constructed, the GBDT model demonstrated superior performance across multiple metrics, including the area under the receiver operating characteristic curve (AUC) and sensitivity, indicating strong generalization ability on the validation set. Further evaluation confirmed its consistent and robust predictive performance across training, validation, and test datasets. Calibration curve analysis showed good agreement between predicted probabilities and actual outcomes. Decision curve analysis demonstrated the model's clinical utility, and the Kolmogorov-Smirnov (KS) curve indicated excellent discriminatory power.
    Conclusion: The GBDT model, constructed based on the identified risk factors, exhibits outstanding predictive performance and promising application potential. It provides important theoretical support and a practical tool for the early identification and targeted intervention of diabetic microvascular complications.
    Keywords:  diabetic microvascular complications; gradient boosting decision tree; machine learning; predictive model; risk factors
    DOI:  https://doi.org/10.3389/fendo.2026.1784699
  20. Sci Rep. 2026 May 04.
      Diabetes mellitus remains one of the most widespread and burdensome chronic diseases worldwide, yet invasive assays and high costs constrain early detection. Existing machine-learning studies often reduce diagnosis to a binary task and overlook the clinically important pre-diabetic stage; additionally, many deep models act as uninterpretable "black boxes". To address these gaps, we propose ProgMDD, an interpretable progressive residual network for multiclass diabetes diagnosis using routine clinical biomarkers. Employing a strict, leakage-free pipeline, LASSO-based feature selection and resampling were applied exclusively to the training set, yielding a compact, robust input panel. After comparing PCA, t-SNE, and UMAP, we selected UMAP for visualization because it optimally balances global and local structure to illustrate progressive class separation. ProgMDD integrates a progressive residual architecture with channel attention and multi-level regularization to enhance feature learning. Rigorously compared against multiple baselines, ProgMDD achieved 97.02% mean accuracy under 5-fold cross-validation, reinforced by a 97.59% accuracy on the purely original, imbalanced hold-out test set and supported by multiple ablation studies. The concordance between LASSO and SHAP rankings supports biological plausibility and model transparency. By uniting interpretable deep learning with low-cost clinical data, ProgMDD furnishes a feasible approach for early screening and risk stratification in primary care, providing a transferable methodological paradigm for other chronic-disease prediction tasks.
    Keywords:  deep learning; diabetes mellitus; interpretability; pre-diabetic stage; progressive residual network; risk stratification
    DOI:  https://doi.org/10.1038/s41598-026-51603-x
  21. Endocrinol Diabetes Metab. 2026 May;9(3): e70193
       INTRODUCTION: This study aimed to validate and compare six IWGDF-approved classification systems for predicting poor prognostic outcomes in diabetic foot ulcers (DFUs), using the same sample in Iran. We also proposed modifications to enhance the performance and feasibility of these tools, in outpatient setting.
    METHODS: A prospective cohort study was conducted involving 616 DFUs from 400 patients over a six-month period. We assessed the performance of six wound classification systems: Wagner, UTWCS, PEDIS/IDSA, SINBAD, WIFI, and DiaFORA. We adjusted for the key variables associated with poor outcomes that could affect the performance of these systems, employing ten machine learning techniques along with the Least Absolute Shrinkage and Selection Operator (LASSO) and random forest methods for feature selection methods.
    RESULTS: Our findings indicated that both the SINBAD and UTWCS systems exhibited comparable effectiveness in predicting outcomes, significantly surpassing other systems. Notably, modifications to the WIFI system-specifically, redefining the wound depth classification to a clearer category-yielded improved predictive capabilities, outperforming the existing systems like SINBAD and UTWCS in predicting poor prognostic outcomes in outpatient setting.
    CONCLUSION: SINBAD and UTWCS systems yield the best performance in our samples. Our proposed modification on the WIFI system can enhance its applicability for outpatient services according to reported performances.
    Keywords:  diabetic foot; foot ulcer/ulceration; machine learning; ulcer classification; validation
    DOI:  https://doi.org/10.1002/edm2.70193
  22. Ophthalmol Ther. 2026 May 06.
       INTRODUCTION: This study aimed to identify optical coherence tomography (OCT) biomarkers at baseline and after the loading phase (LP) of antivascular endothelial growth factor (VEGF), predictive of 12 months (12 m) morpho-functional outcomes in diabetic macular edema (DME).
    METHODS: This multicenter, retrospective study involved treatment-naive DME eyes treated with anti-VEGF agents. The OCT volume scans at baseline, after the LP, and at 12 m were analyzed by an artificial intelligence (AI)-derived platform (Discovery OCT Biomarker Detector; RetinAI AG, Bern, Switzerland). Different retinal layer thicknesses and volumes, intraretinal fluid (IRF), subretinal fluid (SRF), and biomarkers probability detection, including hyperreflective foci (HF) were measured. A random forest model assessed the predictive factors for final morphological and functional outcomes.
    RESULTS: A total of 77 treatment-naive DME eyes from 64 patients treated with anti-VEGF (88.3% aflibercept, 11.7% ranibizumab; mean n. of injections 9.93 ± 3.18) were enrolled. A significant reduction of all the retinal layers, IRF, SRF, and retinal volumes (p < 0.05) after the LP and at 12 m was found. The random forest model revealed that a higher baseline IRF volume was a moderate predictor and a lower outer nuclear layer (ONL) thickness after LP was a strong predictor for a good morphological response at 12 m. Best-corrected visual acuity (BCVA) prediction remained limited due to weaker associations with OCT biomarkers.
    CONCLUSIONS: AI-derived software showed promise in detecting OCT biomarkers and improving 1-year outcome prediction in DME management. Baseline IRF volume and ONL thickness after the LP were strong predictors of achieving a structural response at 12 m, with overall good model performance.
    Keywords:  AI; DME; OCT biomarkers; ONL thickness; Prognostic biomarkers
    DOI:  https://doi.org/10.1007/s40123-026-01386-1
  23. Sensors (Basel). 2026 Apr 21. pii: 2552. [Epub ahead of print]26(8):
      Ramadan fasting substantially alters meal timing, sleep patterns, and daily activity, thereby increasing the risk of hypoglycaemia in adults with type 1 diabetes (T1D). Although continuous glucose monitoring (CGM) systems provide real-time alerts, these are largely reactive or limited to short prediction horizons, offering insufficient warning under fasting-related behavioural and circadian disruption. This study aims to evaluate whether behaviour-aware, temporally enriched recurrent deep learning models, leveraging multimodal CGM and wearable-derived signals, can forecast hypoglycaemia one hour ahead during Ramadan and the post-fasting period. In an observational, free-living cohort study conducted in Qatar, 33 adults with T1D were monitored using CGM and a wrist-worn wearable during Ramadan 2023 and the subsequent month. Multimodal data were aggregated into hourly features and organised into rolling 36 h sequences. In addition to physiological signals, explicit temporal and circadian proxy features were engineered, including cyclic time encodings, day-night indicators, and Ramadan-specific behavioural windows (e.g., pre-iftar, iftar, post-iftar, and fasting phases). Recurrent models, including LSTM and BiLSTM architectures, were trained using patient-wise, leak-free splits, with focal loss applied to address class imbalance. Model performance was evaluated on a held-out, naturally imbalanced test set using ROC AUC, precision-recall AUC, recall, and probability calibration, alongside cross-phase evaluation between Ramadan and post-fasting periods. Following quality control, 1164 participant-days were retained, with hypoglycaemia accounting for approximately 4% of hourly observations. Temporal feature enrichment and the use of a 36 h lookback window improved both discrimination and calibration, with performance stabilizing beyond this horizon. On the imbalanced test set, the best-performing multimodal model achieved an ROC AUC of 0.867 and a precision-recall AUC of 0.341, identifying 77% of next-hour hypoglycaemic events at a sensitivity-focused operating point (precision = 0.14). The selected BiLSTM model demonstrated good probability calibration (Brier score ≈ 0.03). Models trained using wearable-derived inputs alone achieved comparable discrimination and, in some configurations, higher precision-recall AUC than CGM-only baselines. Notably, models trained on the original imbalanced data outperformed resampled variants, suggesting that temporal and behavioural features provided sufficient discriminatory signal without requiring aggressive class balancing. Cross-phase evaluation indicated robust generalisation, particularly for the BiLSTM model. Overall, behaviour-aware, temporally enriched multimodal models can provide calibrated, hour-ahead hypoglycaemia risk estimates during Ramadan fasting in adults with T1D, enabling proactive intervention beyond reactive CGM alerts. Explicit modelling of circadian and behavioural dynamics enhances predictive performance under real-world class imbalance. Furthermore, integrating wearable-derived behavioural and physiological signals adds predictive value beyond CGM alone, supporting robustness across varying levels of contextual data availability. External validation and prospective clinical evaluation are required prior to deployment.
    Keywords:  Ramadan fasting; class imbalance; continuous glucose monitoring; dataset shift; deep learning; hypoglycaemia forecasting; long short-term memory; probability calibration; type 1 diabetes; wearable sensors
    DOI:  https://doi.org/10.3390/s26082552