bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2026–05–17
24 papers selected by
Mott Given



  1. J Investig Med. 2026 May 13. 10815589261452671
      AimsDiabetes mellitus is a global health challenge requiring innovative solutions for early diagnosis, personalized treatment, and ongoing management. This review aims to examine the impact of artificial intelligence (AI) on diabetes care, focusing on precision diagnosis, tailored therapies, and real-time monitoring, while addressing challenges related to model transparency and equitable access to healthcare.MethodsWe conducted a comprehensive review of AI applications in diabetes management. Studies utilizing supervised and unsupervised learning, deep learning, federated learning, and reinforcement learning were analyzed for predictive accuracy, clinical impact, and integration. Comparisons with conventional methods were also included.ResultsMachine learning models show strong predictive performance for diabetes risk assessment, with random forest algorithms reporting accuracy up to 97% in hospital-based datasets. Deep learning models applied to clinical cohorts have achieved approximately 94.6% accuracy in predicting adverse events in patients with type 2 diabetes. Reinforcement learning approaches for automated insulin delivery have maintained glucose within the normoglycemic range for up to 95.66% of simulated time in artificial pancreas studies. Federated learning enables privacy-preserving collaborative model development with performance comparable to centralized models. AI-driven decision-support systems and wearable technologies further support improved glycemic monitoring and patient self-management. However, model performance varies depending on dataset characteristics, patient populations, and evaluation protocols.ConclusionsAI has reshaped diabetes care by enabling precise diagnosis, individualized treatments, and adaptive disease management. Responsible implementation is essential to address ethical concerns and ensure equitable access. Future work should refine AI frameworks for broader clinical adoption, prioritizing patient-centered care and data security.
    Keywords:  Diabetes Mellitus; Diagnostic Imaging; Diagnostic Tests, Routine
    DOI:  https://doi.org/10.1177/10815589261452671
  2. Healthcare (Basel). 2026 Apr 28. pii: 1185. [Epub ahead of print]14(9):
      Background: Hospital readmission among patients with diabetes remains a major challenge for healthcare systems, contributing to increased costs and adverse patient outcomes. Early identification of high-risk patients may support targeted interventions and improved care management. Objectives: This study aimed to develop and rigorously evaluate a machine learning framework for predicting 30-day hospital readmission in patients with diabetes using a large multi-institutional clinical dataset. Methods: The study utilized the Diabetes 130-US Hospitals dataset from the UCI Machine Learning Repository, comprising 101,766 hospital encounters. Data preprocessing included missing-value handling and feature engineering. Several machine learning models were evaluated, including Logistic Regression, Random Forest, XGBoost, and LightGBM, alongside a stacking ensemble model. Model performance was assessed using nested cross-validation (5 outer folds, 3 inner folds), probability calibration via Platt scaling, and statistical robustness through 1000 bootstrap resamples. Clinical utility was evaluated using decision curve analysis and clinical impact curves, while SHAP analysis was applied for model interpretability. Results: The stacking ensemble model achieved a nested cross-validated ROC-AUC of 0.664 and a calibrated AUC of 0.688, with a Brier score of 0.094. Risk stratification demonstrated a clear gradient between low- and high-risk groups, and decision curve analysis indicated positive clinical net benefit across relevant decision thresholds. Conclusions: The proposed machine learning framework provides a robust and clinically interpretable approach for predicting 30-day hospital readmission in diabetic patients, with potential utility for supporting clinical decision-making and care management.
    Keywords:  XGBoost; clinical decision support; diabetes; hospital readmission; machine learning; predictive modeling
    DOI:  https://doi.org/10.3390/healthcare14091185
  3. NPJ Digit Med. 2026 May 15.
      Population-based diabetic retinopathy (DR) screening requires diagnostic strategies that optimize clinical utility by balancing missed disease against referral burden. We performed a Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy studies (PRISMA-DTA)-guided systematic review and meta-analysis comparing autonomous artificial intelligence (AI) screening with store-and-forward (SAF) or conventional image-based teleophthalmology pathways, using manual, expert, or reading-center grading as the reference standard, across any DR, referable DR (RDR), vision-threatening DR (VTDR), and diabetic macular edema (DME). Twenty-eight diagnostic accuracy studies were included. AI showed higher pooled sensitivity than SAF for any DR (86.9% vs 80.9%), RDR (96.2% vs 88.6%), VTDR (96.2% vs 84.2%), and DME (97.2% vs 87.4%). AI also showed higher pooled specificity for any DR, RDR, and VTDR, whereas DME specificity was similar between pathways. Translating operating characteristics into decision consequences demonstrated that pathway preference depends on prevalence, decision thresholds, and misclassification weighting: at 15% prevalence, AI yielded higher net benefit (140.7 vs 120.8 net true-positive decisions per 1000 screened at pₜ = 0.10). These findings support pathway-specific deployment strategies rather than direct superiority claims.
    DOI:  https://doi.org/10.1038/s41746-026-02627-0
  4. BMC Med Inform Decis Mak. 2026 May 11.
       BACKGROUND: Diabetic retinopathy (DR) is a leading cause of vision loss, yet conventional retinal screening remains costly and resource-intensive. This study developed and validated machine-learning (ML) models using routine laboratory data to provide a cost-effective, accessible alternative for DR risk stratification and triage.
    METHODS: We analyzed data from 750 patients (363 T2DM; 387 DR) and externally validated the findings with 451 additional cases. Fifty hematological and biochemical parameters were screened. Six algorithms were trained via five-fold cross-validation, with XGBoost emerging as the top performer. Model interpretability and feature selection were conducted using SHapley Additive exPlanations (SHAP) and ablation analysis.
    RESULTS: The XGBoost model achieved high discriminative performance (AUC = 0.87). Feature ablation identified a streamlined set of four key predictors-total cholesterol (TC), blood urea nitrogen (BUN), fibrinogen (FIB), and glucose (GLU)-maintaining an AUC of 0.87. External validation confirmed robustness (AUC = 0.86) with balanced sensitivity (0.73) and specificity (0.80). Decision curve analysis indicated significant clinical utility, while SHAP provided individualized prediction transparency.
    CONCLUSIONS: Routine laboratory parameters effectively power ML models for DR prediction. The resulting web-based XGBoost tool offers an interpretable, accessible solution for adjunct risk scoring and early triage, particularly beneficial for prioritizing high-risk patients in community and primary-care settings where specialized retinal imaging is unavailable.
    Keywords:  Diabetic retinopathy; Machine learning; Risk stratification; SHAP
    DOI:  https://doi.org/10.1186/s12911-026-03524-y
  5. Sensors (Basel). 2026 Apr 25. pii: 2675. [Epub ahead of print]26(9):
      Blood glucose level (BGL) prediction, by providing early warnings regarding unsatisfactory glycaemic control and maximising the amount of time BGL remains in the target range, can contribute to minimising both acute and chronic complications related to diabetes. This paper aims to provide an overview of data-driven approaches for BGL prediction in type 1 diabetes mellitus (T1DM). This review summarises different aspects of developing and evaluating data-driven prediction models, including model strategy, model input, prediction horizon, and prediction performance. It also examines applications of recent artificial intelligence (AI) techniques, including deep learning, transfer learning, ensemble learning, and causal analysis in the management of T1DM. Recent studies indicate that machine learning approaches often outperform classical time-series forecasting models in BGL prediction, particularly when using multivariate inputs. These findings also highlight the potential of advanced AI methods to improve prediction accuracy. Moreover, applying appropriate statistical analyses is essential to enable valid comparisons between different BGL prediction models, especially given the considerable inter-individual variability among people with T1DM. The development of efficient methods for integrating affecting variables into BGL prediction requires further research. Given the promising performance of advanced AI techniques and the rapid growth of AI innovation, continued exploration of cutting-edge AI strategies will be crucial for further improving BGL prediction models.
    Keywords:  artificial intelligence; blood glucose level; diabetes mellitus; time series forecasting
    DOI:  https://doi.org/10.3390/s26092675
  6. Diagnostics (Basel). 2026 Apr 22. pii: 1254. [Epub ahead of print]16(9):
      Background/Objectives: Diabetes is a chronic metabolic disorder affecting global health, where early prediction can significantly reduce disease severity. Methods: This research proposes an interpretable multi-metric fuzzy distance-based ensemble (MMFDE) that integrates multi-variant gradient-boosting classifiers (GBM, LightGBM, XGBoost, and AdaBoost) through a novel fuzzy fusion mechanism designed for intrinsic interpretability. Unlike conventional ensembles relying on opaque averaging or voting, MMFDE transforms base classifier predictions into a high-dimensional fuzzy space quantified via a weighted hybrid distance incorporating Euclidean, Manhattan, Chebyshev, and cosine metrics against ideal diabetic and non-diabetic reference vectors. These distances are translated into membership degrees with the help of exponentially decaying functions, which give clinicians calibrated confidence scores for every prediction. Comprehensive SHAP analysis identifies important clinical risk factors (glucose, BMI, and diabetes pedigree function), which show concordance with the medical literature, thereby giving greater clinical trust. Results: Experimental evaluations on two publicly available datasets, Hospital Frankfurt Germany Diabetes Dataset (HFGDD) and Pima Indians Diabetes Dataset (PIDD), show that MMFDE outperforms all base models with a significant accuracy of 94.83% and Area Under the Curve (AUC) of 97.66% on HFGDD and three different levels of interpretability: geometric transparency via distance-based decisions, confidence-calibrated uncertainty estimates, and feature-level explanations via SHAP. The confidence thresholds enabled in the framework support risk stratification clinical workflows with high-confidence predictions for automated screening and cases with moderate/low confidence flagged out for review by the clinician. Conclusions: By demonstrating that high performance and interpretability need not be mutually exclusive, MMFDE advances trustworthy AI for clinical decision support, addressing the critical need for transparent and clinically actionable diabetes prediction systems.
    Keywords:  SHAP analysis; clinical decision support; diabetes prediction; eXplainable AI (XAI); ensemble learning; fuzzy fusion; gradient boosting; interpretable machine learning
    DOI:  https://doi.org/10.3390/diagnostics16091254
  7. JMIR AI. 2026 May 12. 5 e87819
       Background: Type 2 diabetes (T2D) is a complex, chronic condition that imposes a substantial burden on health care systems. Prevention and early detection are critical to mitigating its impact. Automated machine learning (AutoML) models have the potential to predict individual risk and guide personalized interventions. However, their clinical deployment remains limited due to the retrospective nature of most datasets, a lack of external validation, and heterogeneity in variable selection.
    Objective: This study aimed to map AutoML approaches applied to T2D risk prediction, with a specific focus on their ability to integrate clinical, behavioral, environmental, and genomic data.
    Methods: A PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)-guided rapid review was conducted across 6 databases (PubMed, Scopus, Web of Science, IEEE Xplore, Google Scholar, and Embase) to identify empirical studies (published between 2015 and 2025) that used AutoML tools for T2D prediction based on at least 2 data types (eg, clinical, behavioral, environmental, and genomic). Screening, data extraction, and synthesis were performed systematically by 2 independent reviewers, with arbitration by ChatGPT acting as an artificial intelligence-based third reviewer.
    Results: In total, 13 studies met the inclusion criteria. Methodological diversity ranged from conventional machine learning with manual feature selection to partially or fully automated pipelines using tools such as the Tree-Based Pipeline Optimization Tool, H2O AutoML, or Azure Machine Learning. Reported performance varied (area under the curve=0.74-0.99); however, external validation was uncommon. Behavioral and environmental data were only partially integrated, and no study incorporated genomic data despite its recognized potential. Most studies lacked transparency and reproducibility, with no public code or pipeline sharing.
    Conclusions: AutoML holds significant promise for improving T2D risk prediction through automation and model explainability. However, to support clinical adoption and generalizability, future AutoML pipelines must be developed using prospective, multicenter datasets; integrate diverse, harmonized data types, including genomics; and adhere to open science principles of transparency, reproducibility, and interpretability.
    Keywords:  AI; AutoML; artificial intelligence; automated machine learning; explainable artificial intelligence; machine learning validation; multimodal data integration; type 2 diabetes risk prediction
    DOI:  https://doi.org/10.2196/87819
  8. Digit Health. 2026 Jan-Dec;12:12 20552076261450332
       Objective: Rheumatoid arthritis (RA) and diabetes mellitus (DM) frequently coexist, yet the heterogeneity of RA-DM multimorbidity remains unclear. This study aims to develop an interpretable machine learning framework to reveal the phenotypic subgroups of RA-DM multimorbidity, providing a potential direction for precision public health interventions.
    Methods: Utilizing data from the National Health and Nutrition Examination Survey (1999-2018), we developed a Bayesian-optimized eXtreme Gradient Boosting (XGBoost) model to classify RA-DM multimorbidity status and compared with other machine learning models. Shapley Additive Explanations (SHAP) was applied to interpret the optimal model and quantify the contributions of different features. A dual-clustering approach combining Self-Organizing Maps and K-means was used to identify RA-DM phenotypic subgroups with different feature contribution patterns based on SHAP profiles.
    Results: The optimized XGBoost model achieved the best classification performance, outperforming other models such as K-nearest neighbors, support vector machine and logistic regression. SHAP analysis identified nine key contributing features (homocysteine, age, glucose, etc), and revealed non-linear interactions among the features. The dual-clustering based on SHAP values identified four distinct RA-DM phenotypes-inflammatory, metabolically protective, age-related and non-obese protective-each exhibiting unique clinical and biochemical patterns.
    Conclusion: This study established an interpretable machine learning framework for identifying distinct phenotypes of RA-DM multimorbidity. These findings provide a data-driven basis for targeted interventions in precision public health, while offering a transferable paradigm for phenotype discovery in other multimorbid conditions.
    Keywords:  diabetes mellitus; machine learning and clustering; multimorbidity; phenotype identification; rheumatoid arthritis
    DOI:  https://doi.org/10.1177/20552076261450332
  9. Molecules. 2026 Apr 23. pii: 1390. [Epub ahead of print]31(9):
      Diabetic kidney disease (DKD) affects approximately 40% of patients with diabetes mellitus and remains a leading cause of end-stage renal disease worldwide. Early diagnosis and identification of therapeutic targets are critical for improving patient outcomes, yet reliable biomarkers are lacking. This study integrated transcriptomic data from the Gene Expression Omnibus (GEO) database (GSE96804, GSE30528, and GSE142025) with machine learning algorithms and Mendelian randomization (MR) to identify diagnostic biomarkers for DKD. Differentially expressed genes (DEGs) were identified and intersected with key modules from weighted gene co-expression network analysis (WGCNA). Four machine learning methods-least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and extreme gradient boosting (XGBoost)-were applied for feature selection. Five hub genes (SPP1, CD44, VCAM1, C3, and TIMP1) were identified at the intersection of these approaches. Two-sample MR analysis using eQTL data from the eQTLGen Consortium and kidney function GWAS from the CKDGen Consortium provided evidence supporting potential causal associations between SPP1, C3, and TIMP1 expression and estimated glomerular filtration rate decline. Immune infiltration analysis via CIBERSORT estimated elevated proportions of M1 macrophages and activated CD4+ memory T cells in DKD samples, with all five hub genes showing correlations with macrophage infiltration. A diagnostic model based on these five genes achieved a cross-validated area under the receiver operating characteristic curve (CV-AUC) of 0.938 in the discovery dataset and AUC values of 0.917 and 0.889 in two independent external validation cohorts. Drug-gene interaction analysis identified 10 candidate compounds targeting the hub genes. These findings provide a computational framework for identifying candidate diagnostic biomarkers and generating hypotheses regarding potential therapeutic targets for DKD; however, all results are derived from in silico analyses and require experimental validation-including qPCR, immunohistochemistry, and prospective clinical cohort studies-before clinical applicability can be established.
    Keywords:  Mendelian randomization; WGCNA; biomarker; diabetic kidney disease; diagnostic model; immune infiltration; machine learning; transcriptomics
    DOI:  https://doi.org/10.3390/molecules31091390
  10. Abdom Radiol (NY). 2026 May 12.
       OBJECTIVE: To develop and externally validate a multimodal artificial intelligence framework for opportunistic detection of preclinical type 2 diabetes mellitus (T2DM) from routine portal venous-phase abdominal CT in patients without recent laboratory testing.
    MATERIALS AND METHODS: In this multicenter retrospective study, 1257 adults without prior diabetes who underwent routine portal venous-phase abdominal CT were included. Patients were classified as preclinical T2DM or normal glucose tolerance based on fasting plasma glucose and oral glucose tolerance testing. Pancreatic segmentation was performed using an nnU-Net-based deep learning model with expert validation. From the segmented pancreas, 1708 Image Biomarker Standardization Initiative (IBSI)-compliant radiomic features were extracted following standardized preprocessing, reproducibility filtering, and ComBat harmonization. In parallel, multi-scale deep features were derived from five state-of-the-art encoder backbones, including transformer-based and segmentation-derived architectures. Clinical variables were incorporated to construct clinical-only, radiomics-only, deep-only, and multimodal fusion models. Six feature selection methods and five classifiers were systematically evaluated using stratified cross-validation.
    RESULTS: In 1257 patients (879 training, 378 external), preclinical T2DM cases were significantly older and had higher BMI, waist circumference, and glycemic indices than controls (all p < 0.001), with comparable demographics between cohorts despite protocol heterogeneity. Clinical-only models reached AUC 0.738. Radiomics improved discrimination (best AUC 0.792). Deep feature models performed better, led by MedFormer-v2 (AUC 0.834), significantly surpassing radiomics. Multimodal fusion achieved the highest external performance (best AUC 0.861). In a secondary analysis excluding all glycemic laboratory variables, the multimodal model still reached AUC 0.837, confirming the added value of imaging biomarkers for opportunistic detection. The stacking ensemble was well calibrated (AUC 0.856) and significantly outperformed three abdominal radiologists who evaluated CT images alone (mean AUC 0.671), while AI assistance improved readers' performance.
    CONCLUSIONS: Multimodal analysis of routine abdominal CT enables accurate and generalizable detection of preclinical T2DM, supporting opportunistic imaging-based metabolic risk assessment. A simplified nomogram was developed to support individualized risk estimation, although its interpretability remains partial due to the inclusion of a multimodal fusion score.
    Keywords:  Abdominal CT imaging; Artificial intelligence; Deep learning; External validation; Multimodal fusion; Pancreatic biomarkers; Preclinical type 2 diabetes prediction; Radiomic features
    DOI:  https://doi.org/10.1007/s00261-026-05536-8
  11. Hum Genomics. 2026 May 14.
       BACKGROUND: Diabetic retinopathy (DR) is a major cause of severe visual impairment, where early diagnosis and intervention are crucial to prevent irreversible damage. Given that obtaining biomarkers from ocular fluids is highly invasive, we here leverage bioinformatic approaches to identify novel biomarkers and potential therapeutic drugs in the peripheral blood of patients with DR.
    METHODS: The GSE221521 dataset, derived from peripheral blood leukocytes, was analyzed to identify differentially expressed genes (DEGs) among controls, diabetes mellitus (DM) and DR patients, with subsequent gene set enrichment analysis (GSEA). Weighted gene co-expression network analysis (WGCNA) was applied to screen DR-associated modules. Key genes were further filtered via protein-protein interaction (PPI) network and support vector machine-recursive feature elimination (SVM-RFE). Two SVM diagnostic models based on these key genes were constructed, trained on the GSE221521 training set, and validated on the test set using receiver operating characteristic (ROC) curve analysis. The potential of the identified key genes as diagnostic biomarkers was further verified in independent clinical samples. Immune infiltration patterns were compared across the three groups. Finally, potential therapeutic compounds for DR were predicted via Connectivity Map (CMap) and molecular docking.
    RESULTS: Differential expression analysis of the GSE221521 dataset identified 635 DEGs. GSEA showed that upregulated DEGs in DR patients (vs. DM) were significantly enriched in the MAPK, Insulin and VEGF signaling pathways. WGCNA identified the green and magenta modules as the top DR-associated modules, from which four (FLNA, TSC22D4, U2AF2, and HCFC1) and two (NUDC and NDUFS6) key genes were screened via PPI and SVM-RFE analysis, respectively. These key genes displayed strong diagnostic efficacy for differentiating DR from DM in our in-house clinical cohort. Immune infiltration analysis revealed significant upregulation of activated mast cells, monocytes and M0 macrophages, along with downregulation of resting mast cells, in DR patients. Integrated CMap-based screening and molecular docking further proposed VU-0400193-3, Tyrphostin-AG-1295 and TG100-115 as candidate drugs against DR.
    CONCLUSIONS: This study provides novel insights into the circulating biomarkers for diagnosis of DR, and predicts therapeutic compounds for this disease.
    Keywords:  Biomarkers; Diabetic retinopathy; Machine learning
    DOI:  https://doi.org/10.1186/s40246-026-00967-2
  12. Front Endocrinol (Lausanne). 2026 ;17 1835866
       Background: Occult diabetic kidney disease (DKD) is a subtle yet high-risk microvascular complication of type 2 diabetes mellitus (T2DM). Early-stage DKD often goes undetected because traditional screening markers remain within the normal range. This study aimed to develop and validate an explainable machine learning (ML) model using routine clinical and laboratory data for the early detection of occult DKD. Its potential value for primary care screening was also evaluated.
    Methods: This multicenter retrospective study included 1,916 hospitalized patients with T2DM. The derivation cohort consisted of 1,066 patients from Wanbei Coal-Electricity Group General Hospital and was used to train the model. An independent cohort of 850 patients from the First Affiliated Hospital of Anhui Medical University served for external validation. Thirty-two routine clinical variables were initially considered. Eight ML algorithms were compared to identify the optimal model. SHapley Additive exPlanations (SHAP) was employed to rank feature importance, reduce variables, and interpret the model. Finally, a quartile-based risk stratification system and a web-based tool were developed.
    Results: Among the eight algorithms, logistic regression (LR) showed the best performance. Using SHAP rankings, a simplified LR model was built with eight features: HGB, HbA1c, HTN, UA, sex, MicroVCs, CVD, and A/G. The model performed well in both the training cohort (AUC = 0.824) and the external validation cohort (AUC = 0.786). SHAP analysis identified HbA1c, uric acid (UA), and hemoglobin (HGB) as the top contributors. The risk stratification system demonstrated clear separation, with the incidence of occult DKD rising from 1.5% in the lowest-risk quartile (Q1) to 55.8% in the highest-risk quartile (Q4). Additionally, decision curve analysis demonstrated that the model provides substantial clinical net benefit, and the final model was implemented as an interactive web-based calculator for real-time risk assessment.
    Conclusion: An explainable ML model was successfully developed to accurately predict occult DKD using routine clinical data. The model combines good performance with clear interpretation. It may serve as a practical tool for large-scale screening and early intervention in primary care.
    Keywords:  clinlabomics; machine learning; occult diabetic kidney disease; primary care; risk stratification; uric acid
    DOI:  https://doi.org/10.3389/fendo.2026.1835866
  13. JMIR Form Res. 2026 May 15. 10 e81039
       Background: Complication risks in children and adolescents with type 1 diabetes (T1D) can lead to serious health outcomes if not detected early. Despite the availability of clinical data, there remains a gap in interpretable tools that support risk stratification in this age group, particularly in alignment with local clinical guidelines.
    Objective: The purpose of this study is to develop a clinically interpretable model that classifies the risk levels of T1D complications-acute, chronic, and low-using real-world data and expert clinical rules derived from the Saudi Diabetes Clinical Practice Guidelines.
    Methods: A pediatric T1D dataset comprising of 306 patients was preprocessed through structured cleaning and feature engineering. Risk labels were constructed using Saudi Diabetes Clinical Practice Guidelines-derived rules. Feature selection was performed using a hybrid approach that combined the SHAP (Shapley Additive Explanations) analysis with exhaustive feature selection. A decision tree model was trained and optimized via cross-validation, using the F1-score as the primary performance metric.
    Results: The final model achieved a high mean F1-score of 0.9876 with a low variance of 0.0189, using only 5 clinical features: BMI, hypoglycemia, disease duration, hemoglobin A1c, and impaired glucose metabolism. These features were consistently ranked as the most influential. The resulting decision tree offered a transparent logic path, enhancing its clinical interpretability and usability.
    Conclusions: This study demonstrates that a simple and interpretable model, guided by national clinical guidelines, can effectively predict the risk levels of T1D complications in children and adolescents. Its strong performance, clarity, and reliance on a small number of clinically meaningful features make it a promising candidate for integration into clinical decision support systems. This supports a shift toward predictive and personalized diabetes care.
    Keywords:  P4 medicine; SHAP analysis; Saudi Diabetes Clinical Practice Guidelines; Shapley Additive Explanations; children and adolescents; clinical decision support systems; complication risk classification; interpretable machine learning; predictive modeling; type 1 diabetes
    DOI:  https://doi.org/10.2196/81039
  14. J Korean Med Sci. 2026 May 11. 41(18): e4
       BACKGROUND: The rising incidence of youth-onset type 2 diabetes mellitus (T2DM), along with the risk of early cardiovascular complications, is concerning. Brachial-ankle pulse wave velocity (baPWV) and carotid intima-media thickness (cIMT) are noninvasive markers of arterial stiffness and atherosclerosis. They serve as important markers for cardiovascular risk assessment. This study aimed to apply machine learning models to identify high-risk individuals who may benefit from early and targeted cardiovascular screening.
    METHODS: This retrospective study included 129 patients with youth-onset T2DM who underwent baPWV and cIMT measurements between January 2018 and July 2024. High-risk groups were defined as having values of ≥ mean + 1 standard deviation for baPWV (arterial stiffness) and cIMT (atherosclerosis). Clinical predictors were evaluated using linear, logistic regression, and machine learning analyses. Multiple machine learning models were trained using oversampling and cross-validation techniques to enhance prediction performance.
    RESULTS: Among the models tested, the gradient boosting model with adaptive synthetic sampling oversampling achieved the best performance in predicting both arterial stiffness (accuracy 0.81) and atherosclerosis prediction (accuracy 0.92). Age and hypertension were consistently identified as the most important factors for arterial stiffness. For atherosclerosis risk, traditional analysis identified dyslipidemia, male sex, and duration of illness as relevant factors; machine learning more clearly emphasized low-density lipoprotein cholesterol and triglyceride levels as key predictors of increased cIMT.
    CONCLUSION: Hypertension and age were consistent predictors of arterial stiffness, while atherosclerosis risk factors were further clarified with lipid parameters by machine learning analysis. These findings suggest that conventional and machine learning analyses offer complementary strengths. Their combined use may enable earlier to detect nuanced cardiovascular risk patterns and support early and targeted vascular screening in youth-onset T2DM.
    Keywords:  Adolescent; Carotid Intima-Media Thickness; Diabetes Mellitus, Type 2; Machine Learning; Pulse Wave Analysis
    DOI:  https://doi.org/10.3346/jkms.2026.41.e4
  15. J Clin Med. 2026 Apr 25. pii: 3287. [Epub ahead of print]15(9):
      Background/Objectives: Type 2 diabetes mellitus (T2DM) is a multisystemic disease with overlapping metabolic, renal, and cardiovascular effects. Within the Diabetic@ project, which aims to characterize individuals with T2DM using real-world data extracted from electronic health records (EHRs), this substudy sought to develop a predictive model for two-year heart failure (HF) risk. Methods: Multicenter, retrospective study including T2DM individuals across eight Spanish hospitals (2013-2018). Data were extracted exclusively from EHRs' unstructured free text using clinical natural language processing (cNLP) and mapped to SNOMED CT. At inclusion, individuals were categorized as having or not prevalent HF (pHF). Predictive modeling was performed in non-pHF to assess two-year risk of developing HF, termed incident HF (iHF). Logistic regression (LR), decision trees, random forest, and XGBoost were compared, selecting for accuracy and interpretability. Results: Of 588,756 individuals with T2DM, 84,197 (14.3%) had pHF. Among non-pHF, 353,371 (60%) were used for model development (90.7% training, 9.3% validation). iHF occurred in 13.6% of the training set and 11.4% of the validation set. Ischemic heart disease was present in 16.2% overall, 37.9% in pHF, and 12.6% in non-pHF. Glycosylated hemoglobin data was rarely reported (<15%). LR achieved the best performance (AUC-ROC 0.73) using 27 predictors. Reduced 12- and clinically refined 9-predictor models performed similarly, with the latter implemented in a web-based tool. Conclusions: Unstructured data from EHRs enabled development of a two-year HF risk model for individuals with T2DM, underscoring the potential of cNLP for risk stratification across the cardiovascular-renal-metabolic spectrum.
    Keywords:  electronic health records; heart failure; natural language processing; predictive model; real-world data; type 2 diabetes mellitus
    DOI:  https://doi.org/10.3390/jcm15093287
  16. Rev Endocr Metab Disord. 2026 May 13.
      Diabetic foot ulcers are serious skin wounds that affect many people with diabetes, often leading to severe infections or even the loss of a limb. This paper explores how artificial intelligence -computer programs that can learn from data-is changing the way doctors find and treat these wounds. By reviewing 68 recent studies, we looked at how these smart technologies analyze different types of medical images to help patients. Our findings show that AI can help doctors identify health risks much earlier than traditional methods. These computer tools are also excellent at measuring how a wound is healing and predicting which treatments will work best for each individual. Because AI can spot tiny patterns in images that the human eye might miss, it makes medical care more precise and consistent. In conclusion, using AI to manage diabetic foot wounds offers a powerful way to improve patient health. By helping doctors make better, data-driven decisions, this technology can lead to faster healing and reduce the risk of serious complications for people living with diabetes.
    Keywords:  Artificial intelligence; Diabetic foot ulcer; Diagnosis and treatment workflow; Multi-modal imaging; Radiomics
    DOI:  https://doi.org/10.1007/s11154-026-10041-w
  17. BMC Microbiol. 2026 May 12.
       BACKGROUND: Type 1 Diabetes Mellitus (T1D) has been increasingly associated with alterations in the gut microbiome. However, the impact of taxonomic resolution, feature selection strategies, and machine learning methods on microbiome-based prediction remains incompletely understood.
    METHODS: We analyzed publicly available 16S rRNA gene sequencing datasets from two geographic cohorts to evaluate microbiome-based prediction of T1D. Microbial features were constructed at multiple taxonomic levels and as full hierarchical taxonomic paths preserving phylogenetic structure. Machine learning models were trained using stratified cross-validation and cross-cohort validation frameworks. Feature selection was performed using Binary Particle Swarm Optimization (BPSO) to identify compact and predictive microbial signatures. Model performance was evaluated using AUC, Accuracy, F1 score, and Matthews Correlation Coefficient. Differential abundance analysis using the LinDA framework was used to support biological interpretation of selected taxa.
    RESULTS: Tree-based models, particularly Random Forest and XGBoost, achieved the strongest predictive performance across taxonomic representations. Taxonomic resolution influenced model behavior, with family-level features providing strong performance with compact feature sets, while higher-resolution representations did not consistently improve performance despite increased complexity. BPSO identified consistently selected taxa across validation frameworks, suggesting stable predictive signatures. Several of these taxa have been linked to inflammatory or metabolically altered gut environments. Cross-cohort validation showed reduced performance compared with within-study models, highlighting challenges in generalization.
    CONCLUSION: Machine learning combined with BPSO-based feature selection provides an effective framework for identifying predictive microbial signatures associated with T1D. Our findings highlight the importance of taxonomic resolution, feature stability, and cross-cohort validation in microbiome-based predictive modeling. Integrating evolutionary feature selection with machine learning and biological validation may improve the robustness and interpretability of candidate microbial signatures.
    Keywords:  Gut microbiome; Machine Learning; Type 1 Diabetes Mellitus
    DOI:  https://doi.org/10.1186/s12866-026-05113-5
  18. Front Med (Lausanne). 2026 ;13 1801177
       Objective: This study aimed to develop and validate a multimodal prediction model integrating optical coherence tomography angiography (OCTA) and glycated hemoglobin (HbA1c) for assessing the 5-year risk of severe systemic complications in patients with diabetic retinopathy (DR).
    Methods: A total of 340 patients with type 2 diabetes and DR were retrospectively enrolled from January 2020 to December 2024. Participants were randomly allocated into training (n = 238) and validation (n = 102) sets at a 7:3 ratio. Univariate analysis, Least Absolute Shrinkage and Selection Operator (LASSO) regression, and multivariate logistic regression, and were applied to identify key predictors. Three models-logistic regression, gradient boosting machine, and convolutional neural network-were constructed. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA). SHapley Additive exPlanations (SHAP) values were used to interpret the optimal model.
    Results: Baseline characteristics were balanced between training and validation sets (P > 0.05). The results of multivariate logistic regression analysis identified history of cardiovascular disease, duration of diabetes, HbA1c, foveal avascular zone (FAZ) area, and urinary albumin-to-creatinine ratio (UACR) as independent influencing factors for the development of systemic complications within 5 years (all P < 0.05). Among the models, the convolutional neural network exhibited superior discrimination and clinical net benefit in both training (AUC = 0.853, 95% CI: 0.797-0.909) and validation sets (AUC = 0.820, 95% CI: 0.706-0.933). SHAP analysis indicated that FAZ area contributed most to predictions, supporting model interpretability.
    Conclusion: A multimodal prediction model incorporating OCTA and HbA1c was successfully developed and validated. The convolutional neural network demonstrated optimal predictive performance and clinical utility, offering a quantitative tool for early identification of high-risk patients and individualized management planning.
    Keywords:  diabetic retinopathy; glycated hemoglobin; machine learning; optical coherence tomography angiography; risk prediction
    DOI:  https://doi.org/10.3389/fmed.2026.1801177
  19. Open Life Sci. 2026 Jan;21(1): 20251325
      Type 1 diabetes mellitus (T1DM) patients require lifelong insulin therapy; however, iatrogenic hypoglycemia remains a major clinical challenge, with high incidence in adults. This study evaluated the performance, methodological rigor, and clinical utility of hypoglycemia risk prediction models for adult T1DM patients to inform evidence-based risk management strategies. Following Cochrane framework and PRISMA guidelines, 18 studies were identified. Data extraction and bias assessment were conducted using the PROBAST tool. The mean area under the curve (AUC) across individual models was 0.85. Meta-analysis of AUC values revealed a pooled AUC of 0.88 (95 % CI: 0.88-0.89), indicating moderate-to-good predictive accuracy. Substantial heterogeneity was observed (I 2 = 99.82 %, P < 0.001), mainly due to differences in prediction time windows, data sources, and validation strategies. Most studies (88.9 %) showed high or unclear risk of bias, and clinical applicability was limited, with only one study meeting criteria for low bias and high applicability. While existing models show moderate predictive performance, significant methodological limitations exist. Future research should focus on optimizing study design, conducting multi-center investigations, developing interpretable AI, standardizing validation protocols, and integrating these models into clinical practice to improve hypoglycemia management.
    Keywords:  hypoglycemia; machine learning algorithms; meta-analysis; predictive models; systematic review; type 1 diabetes
    DOI:  https://doi.org/10.1515/biol-2025-1325
  20. J Neuroradiol. 2026 May 10. pii: S0150-9861(26)00150-1. [Epub ahead of print] 101563
       BACKGROUND AND PURPOSE: Type 1 diabetes mellitus (T1DM) usually begins early in life, and its development impacts brain functioning and cognitive processing. The present study examined spontaneous alterations in brain activity in young adults with T1DM.
    MATERIALS AND METHODS: Thirty-five T1DM participants and thirty-five matched healthy controls underwent resting-state functional magnetic resonance imaging to assess the fractional amplitude of low-frequency fluctuations (fALFF), regional homogeneity (ReHo), and voxel-mirrored homotopic connectivity (VMHC). Between-group comparisons and correlation analyses were performed and corrected for multiple comparisons via Gaussian Random Field theory (voxel p < .001, cluster p < .025). Support Vector Machine (SVM) models were trained using cluster-derived features identified in group comparisons, and performance was evaluated on a test set.
    RESULTS: The T1DM group showed increased ReHo values in occipital lobe areas and reduced ReHo values in cerebellar areas. Within the T1DM group, ReHo values in occipital regions were positively associated with intelligence quotient. VMHC analyses revealed decreased interhemispheric connectivity in the cerebellum and lentiform nucleus in T1DM participants relative to controls. No significant between-group differences in fALFF were observed. When cerebellar ReHo values were used as features, the SVM classifier achieved modest yet statistically significant discrimination between groups.
    CONCLUSIONS: T1DM influences spontaneous brain activity. Decreases in subcortical VMHC and ReHo suggest potential disruptions in connectivity, whereas increases in occipital ReHo may be consistent with a compensatory mechanism. The use of SVM provides preliminary evidence of its potential utility for future research aimed at detecting neural changes in this population.
    Keywords:  ReHo; Support Vector Machine; Type 1 diabetes mellitus; VMHC; fALFF; resting-state fMRI
    DOI:  https://doi.org/10.1016/j.neurad.2026.101563
  21. Sci Rep. 2026 May 21.
      This study investigated gender disparities in random blood glucose (RBS) levels among Pakistani adults with Type 2 Diabetes (T2D), examining biological and sociocultural determinants. A cross-sectional analysis of 300 age-matched adults with T2D (150 men, 150 women; age 35-60 years) from four tertiary hospitals in Peshawar, Pakistan (February-July 2023). RBS was measured via the Microlab-300 system (Beer-Lambert Law). Multivariate regression and machine learning models (Ridge Regression, Random Forest, Support Vector Regression (SVR), Neural Network, Polynomial Regression) with nested cross-validation were used to analyze associations between demographic factors and RBS. Women had significantly higher mean RBS than men (243.6 vs. 210.8 mg/dL, p < 0.001) and a higher prevalence of severe hyperglycemia (≥260 mg/dL: 38.7% vs. 12.0%). Gender alone explained 16.5% of RBS variance in simple linear regression. Age showed a moderate positive correlation with RBS (r = 0.587, p < 0.001). In multivariate analysis, female gender (β = 24.76, p < 0.001), age (β = 3.01 per year, p < 0.001), and BMI (β = 0.88, p = 0.034) were significant predictors, while family history showed a protective effect (β = -13.36, p < 0.001). Machine learning models using only demographic variables achieved moderate predictive performance (R² = 0.421-0.470), with Ridge Regression performing best (R² = 0.470, MAE = 23.68 mg/dL). Feature importance analysis identified age (70.9%), gender (17.8%), and BMI (8.9%) as the dominant predictors. Significant gender disparities exist in random blood glucose among Pakistani adults with T2D, with women exhibiting higher mean values and greater prevalence of severe hyperglycemia. Age, gender, BMI, and family history are important demographic determinants, but demographic factors alone explain less than half of RBS variance. These findings highlight the need for gender-sensitive diabetes management strategies in South Asia and emphasize the necessity of incorporating direct biomarkers in future prediction efforts.
    Keywords:  Gender differences; Glycemic control; Machine learning; Pakistan; Sociocultural determinants; South Asia; Type 2 diabetes mellitus
    DOI:  https://doi.org/10.1038/s41598-026-52654-w
  22. PLoS One. 2026 ;21(5): e0349026
       OBJECTIVE: Addressing the challenges in elucidating the mechanisms of complex diseases such as Type 2 Diabetes Mellitus (T2DM), this study aims to construct a domain-specific cross-medicine knowledge graph (CMKG) and develop a unified path scoring framework that couples graph embeddings with rule-based reasoning, enabling high-precision, interpretable prioritization and explanation of potential drug candidates.
    METHODS: First, multi-source biomedical data from Hetionet, SymMap, TCMBank, STRING, and TTD were integrated. Using Jaccard and overlap-based fusion strategies, entity alignment and relation consolidation were performed to construct a deep CMKG bridged by genes. Second, four graph embedding models (TransE, DistMult, ComplEx, and RotatE) were introduced for link prediction and evaluated using MRR and Hits@K. Finally, to overcome the interpretability limitations of black-box predictions, AnyBURL rule learning was combined with depth-first search (DFS). We innovatively introduced an Ingredient Specificity Index (ISI) and a hybrid path confidence calibration mechanism, constructing a unified path scoring system incorporating length decay, node/relation weights, and experimental evidence bonuses to screen the most critical mechanistic paths.
    RESULTS: The constructed CMKG contains 15 entity types (245,235 entities) and 52 relation types (7,155,373 triples), covering 709 core T2DM genes. Link prediction stability tests across multiple random seeds showed that the ComplEx model consistently performed best in handling complex multi-mapping relations (MRR = 0.213 ± 0.004, Hits@10 = 0.418 ± 0.003). Consequently, the fully converged ComplEx model (Peak Hits@10 = 0.48) was utilized for comprehensive prediction. Retaining the top 100 predictions, Abelmoschus manihot and Topiramate ranked highest among TCM herbs and modern medicine compounds, respectively. Path analysis based on the scoring system revealed deep multi-target mechanisms, including insulin signaling sensitization, inflammatory regulation, and chromatin/cell-cycle intervention.
    CONCLUSION: The proposed gene-bridged graph embedding and unified path scoring framework successfully translates probabilistic predictions into biologically traceable semantic explanations. Rigorous ablation and parameter sensitivity experiments confirm that the framework achieves a robust balance between knowledge coverage and explanatory specificity, providing a transparent, robust, and scalable methodological foundation for candidate drug prioritization in complex diseases.
    DOI:  https://doi.org/10.1371/journal.pone.0349026
  23. Int J Mol Sci. 2026 Apr 29. pii: 3969. [Epub ahead of print]27(9):
      Cardiometabolic diseases, encompassing obesity, insulin resistance, type 2 diabetes (T2D), metabolic dysfunction-associated steatotic liver disease (MASLD), hypertension, and atherosclerotic cardiovascular disease (ASCVD), represent a vast continuum driven by multi-organ network dysregulation. Clinical risk assessment remains dominated by late-stage measures (e.g., fasting glucose, HbA1c, standard lipids). While these assessments predominate the literature and clinical trial endpoints, each incompletely capture early mechanistic risk, inter-individual heterogeneity, and differential response to interventions. Multiomics (genomics, epigenomics, transcriptomics, proteomics, metabolomics, lipidomics, microbiomics, and extracellular vesicle/exosome cargo profiling) expands the biomarker landscape but introduces translational barriers: high dimensionality, cohort heterogeneity, limited causal inference, and insufficient validation pipelines. AI-driven systems biology platforms can support cardiometabolic biomarker discovery and therapeutic translation by enabling systems-level biological inference across heterogeneous datasets, prioritizing mechanism and traceability over purely correlation-based models. GATC Health's Operon™ platform is described as a proprietary, AI-driven internal scientific computing platform designed to support therapeutic discovery and development decision-making across the pharmaceutical lifecycle, including evaluation of drug efficacy, safety, off-target effects, pharmacokinetics (PK), pharmacodynamics (PD), and overall development risk. Operon evolved from earlier generations of GATC Health's internal multiomic modeling systems (formerly referred to as the Multiomics Advanced Technology, MAT) and incorporates expanded data types, orchestration layers, validation workflows, and productization frameworks. Operon is operated by GATC scientists and generates structured, productized outputs (e.g., formal assessments, analyses, and decision frameworks) that are reviewed by experts. Operon methodologies have undergone internal validation and independent academic evaluation under blinded conditions, with reported classification performance (true positive rate 86% and true negative rate 91%) in controlled evaluation settings; these performance metrics should not be interpreted as guarantees of clinical success. This review provides a T2D-centered cardiometabolic biomarker landscape with cardiovascular extension and outlines how Operon-enabled multiomic integration and scenario-based simulation can support early screening, endotype stratification, mechanistic interpretation, and precision intervention design, including AI-guided polypharmacology strategies.
    Keywords:  artificial intelligence; biomarkers; cardiometabolic disease; cardiovascular disease; exosomes; extracellular vesicles; machine learning; multiomics; polypharmacology; precision medicine; systems biology; type 2 diabetes
    DOI:  https://doi.org/10.3390/ijms27093969