bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2025–09–14
thirteen papers selected by
Mott Given



  1. JMIR Med Inform. 2025 Sep 09. 13 e67529
       Background: Artificial intelligence (AI) algorithms offer an effective solution to alleviate the burden of diabetic retinopathy (DR) screening in public health settings. However, there are challenges in translating diagnostic performance and its application when deployed in real-world conditions.
    Objective: This study aimed to assess the technical feasibility of integration and diagnostic performance of validated DR screening (DRS) AI algorithms in real-world outpatient public health settings.
    Methods: Prior to integrating an AI algorithm for DR screening, the study involved several steps: (1) Five AI companies, including four from India and one international company, were invited to evaluate their diagnostic performance using low-cost nonmydriatic fundus cameras in public health settings; (2) The AI algorithms were prospectively validated on fundus images from 250 people with diabetes mellitus, captured by a trained optometrist in public health settings in Chandigarh Tricity in North India. The performance evaluation used diagnostic metrics, including sensitivity, specificity, and accuracy, compared to human grader assessments; (3) The AI algorithm with better diagnostic performance was integrated into a low-cost screening camera deployed at a community health center (CHC) in the Moga district of Punjab, India. For AI algorithm analysis, a trained health system optometrist captured nonmydriatic images of 343 patients.
    Results: Three web-based AI screening companies agreed to participate, while one declined and one chose to withdraw due to low specificity identified during the interim analysis. The three AI algorithms demonstrated variable diagnostic performance, with sensitivity (60%-80%) and specificity (14%-96%). Upon integration, the better-performing algorithm AI-3 (sensitivity: 68%, specificity: 96, and accuracy: 88·43%) demonstrated high sensitivity of image gradability (99.5%), DR detection (99.6%), and referral DR (79%) at the CHC.
    Conclusions: This study highlights the importance of systematic AI validation for responsible clinical integration, demonstrating the potential of DRS to improve health care access in resource-limited public health settings.
    Keywords:  Screening, public health settings; artificial intelligence; diabetic retinopathy; screening; validation, implementation, integration
    DOI:  https://doi.org/10.2196/67529
  2. J Diabetes Sci Technol. 2025 Sep 11. 19322968251365245
      Machine learning (ML) uses computer systems to develop statistical algorithms and statistical models that can draw inferences from demographic data, structured behavioral data, continuous glucose monitor (CGM) tracings, laboratory data, cardiovascular and neurological physiology measurements, and images from a variety of sources. ML is becoming increasingly used to diagnose complications of diabetes based on these types of datasets. In this article, we review the current status, barriers to progress, and future prospects for using ML to diagnose seven complications of diabetes, including five traditional complications, one set of other systemic complications, and one prediction that can result in favorable or unfavorable outcomes. The complications include (1) diabetic retinopathy, (2) diabetic nephropathy, (3) peripheral neuropathy, (4) autonomic neuropathy, (5) diabetic foot ulcers, and (6) other systemic complications. The prediction is for outcomes in hospitalized patients with diabetes. ML for these purposes is in its infancy, as evidenced by only a limited number of products having received regulatory clearance at this time. However, as multicenter reference datasets become available, it will become possible to train algorithms on increasingly larger and more complex datasets and patterns so that diagnoses and predictions will become increasingly accurate. The use of novel choices of images and imaging technologies will contribute to progress in this field. ML is poised to become a widely used tool for the diagnosis of complications and predictions of outcomes and glycemia in people with diabetes.
    Keywords:  artificial intelligence; complications; diabetes; diagnosis; machine learning; prognosis
    DOI:  https://doi.org/10.1177/19322968251365245
  3. Clin Ophthalmol. 2025 ;19 3103-3112
       Purpose: Diabetic retinopathy (DR) is a leading cause of vision loss in working-age adults. Despite the importance of early DR detection, only 60% of patients with diabetes receive recommended annual screenings due to limited eye care provider capacity. FDA-approved AI systems were developed to meet the growing demand for DR screening; however, high costs and specialized equipment limit accessibility. More accessible and equally as accurate AI systems need to be evaluated to combat this disparity. This study evaluated the diagnostic accuracy of ChatGPT-4 Omni (GPT-4o) in classifying DR from color fundus photographs (CFPs) to assess its potential as a low-cost alternative screening tool.
    Methods: We utilized the publicly available EyePACS DR detection competition dataset from Kaggle, which includes 2,500 CFPs representing no DR, mild DR, moderate DR, severe DR, and proliferative DR. Each image was presented to GPT-4o with 1 of 8 prompts designed to enhance the model's accuracy. The results were analyzed through confusion matrices, and metrics such as accuracy, precision, sensitivity, specificity, and F1 scores were calculated to evaluate performance.
    Results: In prompts 1-3, GPT-4o showed a strong bias towards classifying images as no DR, with an average accuracy of 51.0%, while accuracy for other stages ranged from 70% to 80%. GPT-4o struggled with misclassifications, particularly between adjacent DR levels. It performed best in detecting proliferative DR (Level 4), achieving an F1 score above 0.3 and accuracy exceeding 80%. In binary classification tasks (Prompts 4.1-4.4), GPT-4o's performance improved, though it still had difficulty distinguishing mild DR (49.8% accuracy). When compared to FDA-approved AI systems, GPT-4o's sensitivity (47.7%) and specificity (73.8%) were significantly lower.
    Conclusion: While GPT-4o shows promise identifying severe DR, limitations in distinguishing early stages exist and highlight the need for further refinement before clinical usage in DR screening. Unlike traditional CNN-based tools like IDx-DR, GPT-4o is a multimodal foundation model with a fundamentally different architecture and training process, which may contribute to its diagnostic limitations. GPT-4o and other LLMs are not designed to learn about important DR features like microaneurysms or hemorrhages using pixel data which is why they may struggle to detect DR compared to CNN models.
    Keywords:  EyePACS; artificial intelligence; diabetes; eye screening; large language model; multimodal AI
    DOI:  https://doi.org/10.2147/OPTH.S517238
  4. Acta Ophthalmol. 2025 Sep 11.
       PURPOSE: Diabetic retinopathy (DR) is a leading cause of vision loss in middle-aged adults globally. Although artificial intelligence (AI)-based screening tools like IDx-DR (classification) and Thirona RetCAD (regression) have shown high sensitivity in controlled settings, real-world screening faces challenges due to missing or low-quality images and inadequate adaptation to local healthcare needs. The objective was to compare the performance of two AI-based DR screening algorithms (IDx-DR and RetCAD) that analyse non-mydriatic images, against ophthalmologists' mydriatic fundoscopy with image analysis and the impact of customized referral threshold modification ('Greifswald modification') on screening outcomes.
    METHODS: This one-centre observational study included 1716 patients with diabetes mellitus (Clinical Trials Register: DRKS00035967). Sensitivity, specificity, the proportion of ungradable images and the reduction in ophthalmologic evaluations were assessed. Customized referral threshold modification was conducted using the Youden Index.
    RESULTS: In 98 patients (5.7%), no images could be acquired, and 35 patients (2.1%) had incomplete image sets for IDx-DR. IDx-DR rejected 438 patients (25.5%) due to image quality, while RetCAD flagged 134 eyes from 120 patients (6.9%) but provided output for all. Among analysable images, sensitivities ranged from 70.4% (RetCAD) to 93.6% (RetCAD with Greifswald modification). Including all patients reduced sensitivity from 52.7% (IDx-DR) to 79.9% (RetCAD with Greifswald modification). AI screening reduced ophthalmologic exam needs by 47.5% to 78.5%.
    CONCLUSIONS: Real-world DR screening performance of AI algorithms, when including non-analysable patients, can be substantially lower than in controlled studies. The use of regression algorithms enabled the customization of referral thresholds, improving screening accuracy and reducing the clinical burden.
    Keywords:  AI adjustment; Youden Index; artificial intelligence; diabetic retinopathy; non‐mydriatic imaging; real world
    DOI:  https://doi.org/10.1111/aos.17591
  5. Front Public Health. 2025 ;13 1606751
       Background: Obesity is a prevalent and clinically significant complication among individuals with diabetes mellitus (DM), contributing to increased cardiovascular risk, metabolic burden, and reduced quality of life. Despite its high prevalence, the risk factors for obesity within this population remain incompletely understood. With the growing availability of large-scale health datasets and advancements in machine learning, there is an opportunity to improve risk stratification. This study aimed to identify key predictors of obesity and develop a machine learning-based predictive model for patients with T2DM using data from the National Health and Nutrition Examination Survey (NHANES).
    Methods: Data from adults with diabetes were extracted from the NHANES 2007-2018 cycles. Participants were categorized into obese and non-obese groups based on BMI. Least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation was used to select relevant features. Subsequently, nine machine learning algorithms-including logistic regression, random forest (RF), radial support vector machine (RSVM), k-nearest neighbors (KNN), XGBoost, LightGBM, decision tree (DT), elastic net regression (ENet), and multilayer perceptron (MLP)-were employed to construct predictive models. Model performance was evaluated based on area under the ROC curve (AUC), calibration curves, Brier score, and decision curve analysis (DCA). The best-performing model was visualized using a nomogram to enhance clinical applicability.
    Results: A total of 3,794 participants with type 2 diabetes were included in the analysis, of whom 57.0% were classified as obese. LASSO regression identified 19 key variables associated with obesity. Among the nine machine learning models evaluated, the logistic regression model demonstrated the best overall performance, with the lowest Brier score. It also showed good discrimination (AUC = 0.751 in the training set and 0.781 in the test set), favorable calibration, and consistent clinical utility based on decision curve analysis (DCA). A nomogram was constructed based on the logistic regression model to facilitate individualized risk prediction, with total points corresponding to predicted probabilities of obesity.
    Conclusion: Obesity remains highly prevalent among individuals with type 2 diabetes. Our findings highlight key clinical features associated with obesity risk and provide a practical tool to aid in early identification and individualized management of high-risk patients.
    Keywords:  LASSO regression; NHANES; diabetes mellitus; machine learning; nomogram; obesity; predictive modeling
    DOI:  https://doi.org/10.3389/fpubh.2025.1606751
  6. Clin Nurs Res. 2025 Sep 10. 10547738251367551
      The increasing prevalence of diabetes mellitus (DM) and patients' lack of self-management awareness have led to a decline in health-related quality of life (HRQoL). Studies identifying potential risk factors for HRQoL in DM patients and presenting generalized models are relatively scarce. The study aimed to develop and evaluate a machine learning (ML)-based model to predict the HRQoL in adult diabetic patients and to examine the important factors affecting HRQoL. This study extracted factors from the Korea National Health and Nutrition Examination Survey database (2016-2020) based on situation-specific theory, and using data from 2,501 adult DM patients. We developed five ML-based HRQoL classifiers (logistic regression, naïve Bayes, random forest, support vector machine, and extreme gradient boosting (XGBoost) in DM patients. The developed ML model was evaluated using six evaluation metrics to determine the best model, and feature importance was computed based on Shapley additive explanations (SHAP) value. The XGBoost model showed the best performance, with an accuracy of 0.940, a recall of 0.943, a precision of 0.940, a specificity of 0.919, an F1-score of 0.942, and an area under the curve score of 0.984. Based on SHAP values, the top five significant predictors of HRQoL were self-rated health (1.898), employment (0.822), triglycerides (0.781), education level (0.618), and aspartate transaminase/alanine transaminase ratio (0.611). The findings confirmed that the ML-based prediction model achieved high accuracy (over 90%) in distinguishing stable and at-risk groups in terms of HRQoL among adult DM patients. The XGBoost model's superior performance supports its potential integration into routine diabetes care as a decision-support tool. Identifying high-risk individuals early can help healthcare providers implement targeted interventions to improve long-term health outcomes.
    Keywords:  diabetes mellitus; machine learning; probability learning; quality of life; risk factors
    DOI:  https://doi.org/10.1177/10547738251367551
  7. Front Endocrinol (Lausanne). 2025 ;16 1601883
      Diabetes mellitus is a metabolic disorder categorized using hyperglycemia that results from the body's inability to adequately secrete and respond to insulin. Disease prediction using various machine learning (ML) approaches has gained attention because of its potential for early detection. However, it is a challenging task for ML-based algorithms to capture the long-term dependencies like glucose levels in the diabetes data. Hence, this research developed the skip-gated recurrent unit (Skip-GRU) with gradient clipping (GC) approach which is a deep learning (DL)-based approach to predict diabetes effectively. The Skip-GRU network effectively captures the long-term dependencies, and it ignores the unnecessary features and provides only the relevant features for diabetes prediction. The GC technique is used during the training process of the Skip-GRU network that mitigates the exploding gradients issue and helps to predict diabetes effectively. The proposed Skip-GRU with GC approach achieved 98.23% accuracy on a PIMA dataset and 97.65% accuracy on a LMCH dataset. The proposed approach effectively predicts diabetes compared with the existing conventional ML-based approaches.
    Keywords:  deep learning; diabetes mellitus; gradient clipping; long-term dependencies; machine learning; skip-gated recurrent unit
    DOI:  https://doi.org/10.3389/fendo.2025.1601883
  8. Can J Diabetes. 2025 Sep 10. pii: S1499-2671(25)00177-7. [Epub ahead of print]
       OBJECTIVES: To develop a machine learning model that accurately predicts the risk of acquiring COVID-19 in community-dwelling adults with type 1 and/or type 2 diabetes in Alberta, Canada.
    METHODS: This predictive supervised machine learning study included adults (>=18 years old) living in Alberta, Canada between April 1st 2019-March 31st 2021 with pre-existing diabetes (n=372,055, excluding n=2,541 due to migration; final sample size=369,514). The outcome of interest was a positive SARS-CoV-2 PCR test result between March 1st, 2020, and March 1st, 2021. Model features were extracted from routinely collected Alberta administrative health data from March 1st 2015 to March 1st 2020. Fifteen algorithms were trained on 67% of the data and the top performer (Light Gradient Boost Model, LGBoost) was validated on the remaining 33%. The model was calibrated, and model performance assessed using area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC) and threshold analyses.
    RESULTS: Among 369,514 individuals with diabetes, 140,511 were tested of whom 13,082 had a positive SARS-CoV-2 test. The LGBoost model incorporated 367 features with AUROC and AUPRC of 0.69 and 0.08 respectively. The model was well-calibrated for common risk thresholds (<0.2 probability) with high specificity (>=0.98 at all thresholds), however sensitivity and positive predictive values were low at all thresholds (<=0.08 and <=0.18 respectively).
    CONCLUSIONS: The LGBoost model lacked the sensitivity to be clinically useful in predicting SARS-CoV-2 infection in Albertans with diabetes. Alternative data sources may be required to improve future COVID-19 prediction models from the community.
    Keywords:  COVID-19; Diabetes; Epidemiology; Machine Learning; Public Health
    DOI:  https://doi.org/10.1016/j.jcjd.2025.09.001
  9. Healthcare (Basel). 2025 Aug 27. pii: 2138. [Epub ahead of print]13(17):
      Background: Diabetes remains a major global health challenge, contributing significantly to premature mortality due to its potential progression to organ failure if not diagnosed early. Traditional diagnostic approaches are subject to human error, highlighting the need for modern computational techniques in clinical decision support systems. Although these systems have successfully integrated deep learning (DL) models, they still encounter several challenges, such as a lack of intricate pattern learning, imbalanced datasets, and poor interpretability of predictions. Methods: To address these issues, the temporal inception perceptron network (TIPNet), a novel DL model, is designed to accurately predict diabetes by capturing complex feature relationships and temporal dynamics. An adaptive synthetic oversampling strategy is utilized to reduce severe class imbalance in an extensive diabetes health indicators dataset consisting of 253,680 instances and 22 features, providing a diverse and representative sample for model evaluation. The model's performance and generalizability are assessed using a 10-fold cross-validation technique. To enhance interpretability, explainable artificial intelligence techniques are integrated, including local interpretable model-agnostic explanations and Shapley additive explanations, providing insights into the model's decision-making process. Results: Experimental results demonstrate that TIPNet achieves improvement scores of 3.53% in accuracy, 3.49% in F1-score, 1.14% in recall, and 5.95% in the area under the receiver operating characteristic curve. Conclusions: These findings indicate that TIPNet is a promising tool for early diabetes prediction, offering accurate and interpretable results. The integration of advanced DL modeling with oversampling strategies and explainable AI techniques positions TIPNet as a valuable resource for clinical decision support, paving the way for its future application in healthcare settings.
    Keywords:  Shapley additive explanations; adaptive synthetic oversampling; confidence interval; deep learning; diabetes prediction; inception network; k-fold cross validation; local interpretable model-agnostic explanations; long short-term memory; multi-layer perceptron
    DOI:  https://doi.org/10.3390/healthcare13172138
  10. Diabetes Res Clin Pract. 2025 Sep 04. pii: S0168-8227(25)00467-X. [Epub ahead of print] 112453
       AIMS: The mixed-meal tolerance test (MMTT), though considered the gold standard for evaluating residual beta-cell function in type 1 diabetes mellitus (T1D), is impractical for routine use. We aimed to develop and validate a machine learning (ML) model to predict MMTT-stimulated C-peptide categories using routine clinical data.
    METHODS: Data from 319 individuals in the T1D Exchange Registry with complete MMTT and clinical information were analyzed. The cohort was randomly split into training (70%) and test (30%) sets. Five clinical variables-age at diagnosis, diabetes duration, HbA1c, non-fasting glucose, and non-fasting C-peptide-were selected via recursive feature elimination. Four ML algorithms (random forest [RF], XGBoost, LightGBM, and ordinal logistic regression) were trained with 10-fold cross-validation.
    RESULTS: The RF model showed the highest performance: AUC 0.94 (95% CI: 0.92-0.96), sensitivity 0.84 (95% CI: 0.80-0.89), and specificity 0.92 (95% CI: 0.90-0.94) in cross-validation. In the test set, AUC was 0.97, sensitivity 88%, and specificity 94%. Notably, 17.7% of individuals with undetectable non-fasting C-peptide had measurable levels after MMTT.
    CONCLUSIONS: This ML model provides a practical, non-invasive tool for estimating beta-cell function in T1D and is available online at https://cpeptide.streamlit.app.
    Keywords:  Beta-cell function; C-peptide; Clinical decision support systems; Machine learning; Mixed-meal tolerance test; Type 1 diabetes mellitus
    DOI:  https://doi.org/10.1016/j.diabres.2025.112453
  11. Nat Sci Sleep. 2025 ;17 2013-2025
       Introduction: Type 2 diabetes (T2D) shows bidirectional relationships with polysomnographic measures. However, no studies have searched systematically for novel polysomnographic biomarkers of T2D. We therefore investigated if state-of-the-art explainable machine learning (ML) models could identify new polysomnographic biomarkers predictive of incident T2D.
    Methods: We applied explainable ML models to longitudinal cohort study data from 536 males who were free of T2D at baseline and identified 52 cases of T2D at follow-up (mean 8.3, range 3.5-10.5 years). Beyond ranking biomarker importance, we explored how the explainable ML model approach can identify novel relationships, assist in hypothesis testing, and provide insights into risk factors.
    Results: The top five most predictive biomarkers included waist circumference, glucose, and three novel sleep biomarkers: the number of 3% desaturations in non-supine sleep, mean heart rate in supine sleep, and mean hypopnea duration. Explainable machine learning identified a significant association between the number of non-supine desaturation events (threshold of 19 events) and incident T2D (Odds ratio = 2.4 [95% CI 1.2-4.8], P = 0.013). No significant associations were found using continuous or quartiled versions of non-supine desaturation. Additionally, the model provided an individualized risk factor breakdown, supporting a more personalized approach to precision sleep medicine.
    Conclusion: Explainable ML supports the role of established biomarkers and reveals novel biomarkers of T2D likely to help guide further hypothesis testing and validation of more robust and clinically useful biomarkers. Although further validation is needed, these proof-of-concept data support the benefits of explainable ML in prospective data analysis.
    Keywords:  explainable machine learning; obstructive sleep apnoea; polysomnographic biomarkers; type 2 diabetes
    DOI:  https://doi.org/10.2147/NSS.S512262
  12. JMIR Med Inform. 2025 Sep 09.
       UNSTRUCTURED: Introduction: Diabetic Nephropathy (DN), a severe complication of diabetes, is characterized by proteinuria, hypertension, and progressive renal function decline, potentially leading to end-stage renal disease. The International Diabetes Federation projects that by 2045, 783 million people will have diabetes, with 30%-40% of them developing DN. Current diagnostic approaches lack sufficient sensitivity and specificity for early detection and diagnosis, underscoring the need for an accurate, interpretable predictive model to enable timely intervention, reduce cardiovascular risks, and optimize healthcare costs. Methods: Our retrospective cohort study investigated 1,000 type-2 diabetes patients using data from electronic medical records collected between 2015 and 2020. The study design incorporated a sample of 444 patients with diabetic nephropathy and 556 without, focusing on demographics, clinical metrics such as blood pressure and glucose levels, and renal function markers. Data collection relied on electronic records, with missing values handled via multiple imputation and dataset balance achieved using SMOTE. In this study, advanced machine learning algorithms, namly XGBoost, CatBoost, and LightGBM, were utilized due to their robustness in handling complex datasets. Key metrics, including accuracy, precision, recall, F1 score, specificity, and area under the curve (AUC), were employed to provide a comprehensive assessment of model performance. Additionally, Explainable Machine Learning (XML) techniques, such as LIME and SHAP, were applied to enhance the transparency and interpretability of the models, offering valuable insights into their decision-making processes. Results: XGBoost and LightGBM demonstrated superior performance, with XGBoost achieving the highest accuracy of 86.87%, a precision of 88.90%, a recall of 84.40%, an f1 score of 86.44%, and a specificity of 89.12%. LIME and SHAP analyses provided insights into the contribution of individual features to elucidate the decision-making processes of these models, identifying serum creatinine, albumin, and lipoproteins as significant predictors. Conclusion: The developed machine learning model not only provides a robust predictive tool for early diagnosis and risk assessment of DN but also ensures transparency and interpretability, crucial for clinical integration. By enabling early intervention and personalized treatment strategies, this model has the potential to improve patient outcomes and optimize healthcare resource utilization.
    DOI:  https://doi.org/10.2196/64979
  13. Sensors (Basel). 2025 Aug 31. pii: 5372. [Epub ahead of print]25(17):
      Postprandial hyperglycemia, marked by the blood glucose level exceeding the normal range after consuming a meal, is a critical indicator of progression toward type 2 diabetes in people with prediabetes and in healthy individuals. A key metric for understanding blood glucose dynamics after eating is the postprandial Area Under the Curve (AUC). Predicting postprandial AUC in advance based on a person's lifestyle factors, such as diet and physical activity level, and explaining the factors that affect postprandial blood glucose could allow an individual to adjust their behavioral choices accordingly to maintain normal glucose levels. In this work, we develop an explainable machine learning solution, GlucoLens, that takes sensor-driven inputs and utilizes advanced data processing, large language models, and trainable machine learning models to estimate postprandial AUC and predict hyperglycemia from diet, physical activity, and recent glucose patterns. We use data obtained using wearables in a five-week clinical trial of 10 adults who worked full-time to develop and evaluate the proposed computational model that integrates wearable sensing, multimodal data, and machine learning. Our machine learning model takes multimodal data from wearable activity and glucose monitoring sensors, along with food and work logs, and provides an interpretable prediction of the postprandial glucose patterns. GlucoLens achieves a normalized root mean squared error (NRMSE) of 0.123 in its best configuration. On average, the proposed technology provides a 16% better predictive performance compared to the comparison models. Additionally, our technique predicts hyperglycemia with an accuracy of 79% and an F1 score of 0.749 and recommends different treatment options to help avoid hyperglycemia through diverse counterfactual explanations. With systematic experiments and discussion supported by established prior research, we show that our method is generalizable and consistent with clinical understanding.
    Keywords:  continuous glucose monitoring; diabetes; hyperglycemia; large language models; machine learning; metabolic health
    DOI:  https://doi.org/10.3390/s25175372