bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2025–08–03
fourteen papers selected by
Mott Given



  1. J Clin Med. 2025 Jul 20. pii: 5150. [Epub ahead of print]14(14):
      Background/Objectives: Diabetic retinopathy (DR) is a progressive microvascular complication of diabetes mellitus and a leading cause of vision impairment worldwide. Early detection and timely management are critical in preventing vision loss, yet current screening programs face challenges, including limited specialist availability and variability in diagnoses, particularly in underserved areas. This literature review explores the evolving role of artificial intelligence (AI) in enhancing the diagnosis, screening, and management of diabetic retinopathy. It examines AI's potential to improve diagnostic accuracy, accessibility, and patient outcomes through advanced machine-learning and deep-learning algorithms. Methods: We conducted a non-systematic review of the published literature to explore advancements in the diagnostics of diabetic retinopathy. Relevant articles were identified by searching the PubMed and Google Scholar databases. Studies focusing on the application of artificial intelligence in screening, diagnosis, and improving healthcare accessibility for diabetic retinopathy were included. Key information was extracted and synthesized to provide an overview of recent progress and clinical implications. Conclusions: Artificial intelligence holds transformative potential in diabetic retinopathy care by enabling earlier detection, improving screening coverage, and supporting individualized disease management. Continued research and ethical deployment will be essential to maximize AI's benefits and address challenges in real-world applications, ultimately improving global vision health outcomes.
    Keywords:  artificial intelligence; diabetes mellitus; diabetic retinopathy; healthcare accessibility; screening
    DOI:  https://doi.org/10.3390/jcm14145150
  2. Front Nutr. 2025 ;12 1612369
       Objective: This study aims to develop and validate a machine learning model that integrates dietary antioxidants to predict cardiovascular disease (CVD) risk in diabetic patients. By analyzing the contributions of key antioxidants using SHAP values, the study offers evidence-based insights and dietary recommendations to improve cardiovascular health in diabetic individuals.
    Methods: This study leveraged data from the U.S. National Health and Nutrition Examination Survey (NHANES) to develop predictive models incorporating antioxidant-related variables-including vitamins, minerals, and polyphenols-alongside demographic, lifestyle, and health status factors. Data preprocessing involved collinearity removal, standardization, and class imbalance correction. Multiple machine learning models were developed and evaluated using the mlr3 framework, with benchmark testing performed to compare predictive performance. Feature importance in the best-performing model was interpreted using SHapley Additive exPlanations (SHAP).
    Results: This study utilized data from 1,356 individuals with diabetes from NHANES, including 332 with comorbid CVD. After removing collinear variables, 27 dietary antioxidant features and 13 baseline covariates were retained. Among all models, XGBoost demonstrated the best predictive performance, with an accuracy of 87.4%, an error rate of 12.6%, and both AUC and PRC values of 0.949. SHAP analysis highlighted Daidzein, magnesium (Mg), epigallocatechin-3-gallate (EGCG), pelargonidin, vitamin A, and theaflavin 3'-gallate as the most influential predictors.
    Conclusion: XGBoost exhibited the highest predictive performance for cardiovascular disease risk in diabetic patients. SHAP analysis underscored the prominent contribution of dietary antioxidants, with Daidzein and Mg emerging as the most influential predictors.
    Keywords:  SHAP; cardiovascular disease; diabetes; dietary antioxidants; machine learning
    DOI:  https://doi.org/10.3389/fnut.2025.1612369
  3. Front Med (Lausanne). 2025 ;12 1636214
       Background: Early identification of Type 1 Diabetes Mellitus (T1DM) in pediatric populations is crucial for implementing timely interventions and improving long-term outcomes. Peripheral blood transcriptomic analysis provides a minimally invasive approach for identifying predictive biomarkers prior to clinical manifestation. This study aimed to develop and validate machine learning algorithms utilizing transcriptomic signatures to predict T1DM onset in children up to 46 months before clinical diagnosis.
    Methods: We analyzed 247 peripheral blood RNA-sequencing samples from pre-diabetic children and age-matched healthy controls. Differential gene expression analysis was performed using established bioinformatics pipelines to identify significantly dysregulated transcripts. Five feature selection methods (Lasso, Elastic Net, Random Forest, Support Vector Machine, and Gradient Boosting Machine) were employed to optimize gene sets. Nine machine learning algorithms (Decision Tree, Gradient Boosting Machine, K-Nearest Neighbors, Linear Discriminant Analysis, Logistic Regression, Multilayer Perceptron, Naive Bayes, Random Forest, and Support Vector Machine) were combined with selected features, generating 45 unique model combinations. Performance was evaluated using accuracy, precision, recall, and F1-score metrics. Model validation was conducted using quantitative polymerase chain reaction (qPCR) in an independent cohort of six children (three healthy, three diabetic).
    Results: Transcriptomic analysis revealed significant differential expression patterns between pre-diabetic and control groups. Four model combinations demonstrated superior predictive performance: Lasso+K-Nearest Neighbors, Elastic Net + K-Nearest Neighbors, Elastic Net + Random Forest, and Support Vector Machine+K-Nearest Neighbors. These models achieved high accuracy in predicting diabetes onset up to 46 months before clinical diagnosis. Both Elastic Net-based models achieved perfect classification performance in the validation cohort, demonstrating their potential as clinically viable diagnostic tools.
    Conclusion: This study establishes the feasibility of integrating peripheral blood transcriptomic profiling with machine learning for early pediatric T1DM prediction. The identified transcriptomic signatures and validated predictive models provide a foundation for developing clinically translatable, non-invasive diagnostic tools. These findings support the implementation of precision medicine approaches for childhood diabetes prevention and warrant validation in larger, multi-center cohorts to assess generalizability and clinical utility.
    Keywords:  childhood diabetes; machine learning; pediatric biomarkers; peripheral blood; transcriptomic analysis
    DOI:  https://doi.org/10.3389/fmed.2025.1636214
  4. Sci Rep. 2025 Aug 01. 15(1): 28103
      It is to develop a predictive model utilizing machine learning techniques to promptly identify patients with diabetic foot ulcers (DFU) who may require major amputation upon their initial admission. A total of 598 DFU patients were admitted to a tertiary hospital in Beijing. We employed synthetic minority oversampling technique to address the class imbalance of the target variable in the original dataset. A Lasso regularization analysis identified 17 feature variables for inclusion in the model: age, diabetes duration, wound size, history of peripheral neuropathy, history of atrial fibrillation, white blood cell count, C-reactive protein (CRP), procalcitonin, glycated hemoglobin (HbA1c), myoglobin (Mb), troponin (Tn), blood urea nitrogen, serum albumin, triglycerides (TG), low-density lipoprotein cholesterol, multidrug-resistant infection, vascular intervention. Subsequently, risk prediction models were independently developed by using these feature variables based on six machine learning algorithms: logistic regression, random forest, support vector machine, K-nearest neighbors, gradient boosting machine (GBM), and extreme gradient boosting (XGBoost). The performance of six models was evaluated to select the best model for predicting the risk of major amputation. GBM was identified as the best predictive model (accuracy 0.9408, precision 0.9855, recall 0.8553, F1-score 0.9158, and AUC 0.9499). This model also highlights the importance ranking of feature variables associated with predicting the risk of major amputation, with the top five variables being the presence of multidrug-resistant infection, CRP, diabetes duration, Tn, age. It is an effective machine learning method that GBM model is used to predict the risk of major amputations in diabetic foot patients.
    Keywords:  Diabetic foot; Machine learning; Major amputation; Risk-factors
    DOI:  https://doi.org/10.1038/s41598-025-13534-x
  5. SLAS Technol. 2025 Jul 28. pii: S2472-6303(25)00081-0. [Epub ahead of print] 100323
      Diabetic Retinopathy (DR) is a complication of diabetes that can cause vision impairment and lead to permanent blindness if left undiagnosed. The increasing number of diabetic patients, coupled with a shortage of ophthalmologists, highlights the urgent need for automated screening tools for early DR diagnosis. Among the earliest and most detectable signs of DR are microaneurysms (MAs). However, detecting MAs in fundus images remains challenging due to several factors, including image quality limitations, the subtle appearance of MA features, and the wide variability in color, shape, and texture. To address these challenges, we propose a novel preprocessing pipeline that enhances the overall image quality, facilitating feature learning and improving the detection of subtle MA features in low-quality fundus images. Building on this preprocessing technique, we further develop a lightweight Attention U-Net model that significantly reduces the number of model parameters while achieving superior performance. By incorporating an attention mechanism, the model focuses on the subtle features of MAs, leading to more precise segmentation results. We evaluated our method on the IDRID dataset, achieving a sensitivity of 0.81 and specificity of 0.99, outperforming existing MA segmentation models. To validate its generalizability, we tested it on the E-Ophtha dataset, where it achieved a sensitivity of 0.59 and specificity of 0.99. Despite its lightweight design, our model demonstrates robust performance under challenging conditions such as noise and varying lighting, making it a promising tool for clinical applications and large-scale DR screening.
    Keywords:  Attention U-net; Diabetic retinopathy; Image enhancement; Lightweight; Microaneurysm; Preprocess pipeline; Segmentation
    DOI:  https://doi.org/10.1016/j.slast.2025.100323
  6. Front Endocrinol (Lausanne). 2025 ;16 1626203
       Background: Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, seriously threatening the public health. Although the proportion of patients with lean NAFLD is lower than that of patients with obese NALFD, it should not be overlooked. This study aimed to construct interpretable machine learning models for predicting lean NAFLD risk in type 2 diabetes mellitus (T2DM) patients.
    Methods: This study enrolled 1,553 T2DM individuals who received health care at the First Affiliated Hospital of Ningbo University, Ningbo, China, from November 2019 to November 2024. Feature screening was performed using the Boruta algorithm and the Least Absolute Shrinkage and Selection Operator (LASSO). Linear discriminant analysis (LDA), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGboost) were used in constructing risk prediction models for lean NAFLD in T2DM patients. The area under the receiver operating characteristic curve (AUC) was used to assess the predictive capacity of the model. Additionally, we employed SHapley Additive exPlanations (SHAP) analysis to unveil the specific contributions of individual features in the machine learning model to the prediction results.
    Results: The prevalence of lean NAFLD in the study population was 20.3%. Eight variables, including age, body mass index (BMI), and alanine aminotransferase (ALT), were identified as independent risk factors for lean NAFLD. Ten predictive factors, including BMI, ALT, and aspartate aminotransferase (AST), were screened for the construction of risk prediction models. The random forest model demonstrated superior performance compared to alternative machine learning (ML) algorithms, achieving an AUC of 0.739 (95% confidence interval [CI]: 0.676-0.802) in the training set, and it also exhibited the best predictive value in the internal validation set with an AUC of 0.789 (95% CI: 0.722-0.856). In addition, the SHAP method identified TG, ALT, GGT, BMI, and UA as the top five variables influencing the predictions of the RF model.
    Conclusion: The construction of lean NAFLD risk models based on the Chinese T2DM population, particularly the RF model, facilitates its early prevention and intervention, thereby reducing the risks of intrahepatic and extrahepatic adverse outcomes.
    Keywords:  interpretable machine learning; lean non-alcoholic fatty liver disease; predict risk; prediction model; type 2 diabetes mellitus
    DOI:  https://doi.org/10.3389/fendo.2025.1626203
  7. Sensors (Basel). 2025 Jul 19. pii: 4492. [Epub ahead of print]25(14):
      Ocular diseases can significantly affect vision and overall quality of life, with diagnosis often being time-consuming and dependent on expert interpretation. While previous computer-aided diagnostic systems have focused primarily on medical imaging, this paper proposes VisionTrack, a multi-modal AI system for predicting multiple retinal diseases, including Diabetic Retinopathy (DR), Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), drusen, Central Serous Retinopathy (CSR), and Macular Hole (MH), as well as normal cases. The proposed framework integrates a Convolutional Neural Network (CNN) for image-based feature extraction, a Graph Neural Network (GNN) to model complex relationships among clinical risk factors, and a Large Language Model (LLM) to process patient medical reports. By leveraging diverse data sources, VisionTrack improves prediction accuracy and offers a more comprehensive assessment of retinal health. Experimental results demonstrate the effectiveness of this hybrid system, highlighting its potential for early detection, risk assessment, and personalized ophthalmic care. Experiments were conducted using two publicly available datasets, RetinalOCT and RFMID, which provide diverse retinal imaging modalities: OCT images and fundus images, respectively. The proposed multi-modal AI system demonstrated strong performance in multi-label disease prediction. On the RetinalOCT dataset, the model achieved an accuracy of 0.980, F1-score of 0.979, recall of 0.978, and precision of 0.979. Similarly, on the RFMID dataset, it reached an accuracy of 0.989, F1-score of 0.881, recall of 0.866, and precision of 0.897. These results confirm the robustness, reliability, and generalization capability of the proposed approach across different imaging modalities.
    Keywords:  Convolutional Neural Network (CNN); Graph Neural Network (GNN); Large Language Model (LLM); ocular diseases; ophthalmology; retinal image
    DOI:  https://doi.org/10.3390/s25144492
  8. Sci Rep. 2025 Jul 29. 15(1): 27695
      The prevalence of type 2 diabetes mellitus (T2DM) in Korea has risen in recent years, yet many cases remain undiagnosed. Advanced artificial intelligence models using multi-modal data have shown promise in disease prediction, but two major challenges persist: the scarcity of samples containing all desired data modalities and class imbalance in T2DM datasets. We propose a novel transfer learning framework to predict T2DM onset within five years, using two Korean cohorts (KoGES and SNUH). To utilize unpaired multi-modal data, our approach transfers knowledge between clinical and genetic domains, leveraging unpaired clinical data alongside paired data. We also address class imbalance by applying a positively weighted binary cross-entropy (BCE) loss and a weighted random sampler (WRS). The transfer learning framework improved T2DM prediction performance. Using WRS and weighted BCE loss increased the model's balanced accuracy and AUC (achieving test AUC 0.8441). Furthermore, combining transfer learning with intermediate data fusion yielded even higher performance (test AUC 0.8715). These enhancements were achieved despite limited paired multi-modal samples. Our framework effectively handles scarce paired data and class imbalance, leading to improved T2DM risk prediction. This approach can be adapted to other medical prediction tasks and integrated with additional data modalities, potentially aiding earlier diagnosis and better disease management in clinical settings.
    DOI:  https://doi.org/10.1038/s41598-025-05532-w
  9. Sci Rep. 2025 Jul 29. 15(1): 27625
    Gatekeeper Consortium
      The accurate prediction of blood glucose is critical for the effective management of diabetes. Modern continuous glucose monitoring (CGM) technology enables real-time acquisition of interstitial glucose concentrations, which can be calibrated against blood glucose measurements. However, a key challenge in the effective management of type 2 diabetes lies in forecasting critical events driven by glucose variability. While recent advances in deep learning enable modeling of temporal patterns in glucose fluctuations, most of the existing methods rely on unimodal inputs and fail to account for individual physiological differences that influence interstitial glucose dynamics. These limitations highlight the need for multimodal approaches that integrate additional personalized physiological information. One of the primary reasons for multimodal approaches not being widely studied in this field is the bottleneck associated with the availability of subjects' health records. In this paper, we propose a multimodal approach trained on sequences of CGM values and enriched with physiological context derived from health records of 40 individuals with type 2 diabetes. The CGM time series were processed using a stacked Convolutional Neural Network (CNN) and a Bidirectional Long Short-Term Memory (BiLSTM) network followed by an attention mechanism. The BiLSTM learned long-term temporal dependencies, while the CNN captured local sequential features. Physiological heterogeneity was incorporated through a separate pipeline of neural networks that processed baseline health records and was later fused with the CGM modeling stream. To validate our model, we utilized CGM values of 30 min sampled with a moving window of 5 min to predict the CGM values with a prediction horizon of (a) 15 min, (b) 30 min, and (c) 60 min. We achieved the multimodal architecture prediction results with Mean Absolute Point Error (MAPE) between 14 and 24 mg/dL, 19-22 mg/dL, 25-26 mg/dL in case of Menarini sensor and 6-11 mg/dL, 9-14 mg/dL, 12-18 mg/dL in case of Abbot sensor for 15, 30 and 60 min prediction horizon respectively. The results suggested that the proposed multimodal model achieved higher prediction accuracy compared to unimodal approaches; with upto 96.7% prediction accuracy; supporting its potential as a generalizable solution for interstitial glucose prediction and personalized management in the type 2 diabetes population.
    Keywords:  Deep learning; Interstitial glucose prediction; Multimodal AI
    DOI:  https://doi.org/10.1038/s41598-025-07272-3
  10. Front Oncol. 2025 ;15 1595553
       Background: Neuregulin 4 (NRG4) is a novel metabolic regulator closely associated with insulin resistance and thyroid dysfunction. However, its role in the pathogenesis of comorbid type 2 diabetes mellitus and hyperthyroidism (T2DM-FT) remains to be systematically elucidated. Given the complex clinical characteristics of T2DM-FT patients, traditional statistical methods are often insufficient to effectively analyze nonlinear relationships among multiple variables. Machine learning techniques have garnered widespread attention due to their advantages in modeling high-dimensional, heterogeneous data.
    Objective: This study was to evaluate the predictive capability of a support vector machine (SVM) model based on serum NRG4 combined with a convolutional neural network (CNN) and long short-term memory network (LSTM)-based ultrasound feature classification (SVM-CNN+LSTM) model for predicting the occurrence of FT in patients with T2DM.
    Methods: Studied 500 T2DM patients (60 with FT, 440 without), and 200 healthy controls. Collected data on demographics, disease characteristics, NRG4, and thyroid indices. Pearson correlation was used to identify features correlated with NRG4. A parameter-optimized SVM model (C=1, linear kernel) was constructed for structured data modeling. Additionally, a CNN+LSTM network was employed to extract spatial (thyroid morphology) and temporal (hemodynamics) features from ultrasound sequences. These features were then fused with biochemical indicators, such as NRG4, to develop the final SVM-CNN+LSTM multimodal predictive model.
    Results: Serum NRG4 levels in T2DM+FT patients were significantly higher than those in the healthy Ctrl group (4.44 ± 1.25 vs. 2.17 ± 0.48 μg/L, P< 0.05). NRG4 levels were positively correlated with HOMA-IR (r = 0.593), FT3 (r = 0.773), FT4 (r = 0.683), thyroid volume (r = 0.652), and the resistance index (RI) (r = 0.473) (P< 0.05). The optimized SVM model demonstrated a sensitivity of 86.23%, specificity of 90.33%, and an area under the curve (AUC) of 0.887. In contrast, the fusion model SVM-CNN+LSTM outperformed the SVM model across all metrics, achieving a sensitivity of 91.32%, specificity of 94.18%, and an AUC of 0.943 (P< 0.05).
    Conclusion: The SVM-CNN+LSTM multimodal model, which integrates serum NRG4 levels with ultrasound features, significantly enhances the predictive accuracy of hyperthyroidism in T2DM patients. This approach effectively reveals the multifactorial mechanisms underlying T2DM-FT comorbidity, providing a powerful tool for early clinical intervention.
    Keywords:  CNN+LSTM model; Nrg4; SVM; T2DM complicated by FT; classification; ultrasound images
    DOI:  https://doi.org/10.3389/fonc.2025.1595553
  11. J Appl Oral Sci. 2025 ;pii: S1678-77572025000100436. [Epub ahead of print]33 e20250211
       OBJECTIVE: To evaluate factors influencing the response to periodontal therapy in patients with periodontitis and type 2 diabetes mellitus (DM) using machine learning (ML) techniques, considering periodontal parameters, metabolic status, and demographic characteristics.
    METHODOLOGY: We applied machine learning techniques to perform a post hoc analysis of data collected at baseline and a 6-month follow-up from a randomized clinical trial (RCT). A leave-one-out cross-validation strategy was used for model training and evaluation. We tested seven different algorithms: K-Nearest Neighbors, Decision Tree, Support Vector Machine, Random Forest, Extreme Gradient Boosting, and Logistic Regression. Model performance was assessed using accuracy, specificity, recall, and the area under the Receiver Operating Characteristic (ROC) curve (AUC).
    RESULTS: a total of 75 patients were included. Using the first exploratory data analysis, we observed three clusters of patients who achieved the clinical endpoint related to HbA1c values. HbA1c ≤ 9.4% was correlated with lower PD (r=0.2), CAL (r=0.1), and the number of sites with PD ≥5 mm (r=0.1) at baseline. This study induced AI classification models with different biases. The model with the best fit was Random Forest with a 0.83 AUC. The Random Forest AI model has an accuracy of 80%, a sensitivity of 64%, and a specificity of 87%. Our findings demonstrate that PD and CAL were the most important variables contributing to the predictive performance of the Random Forest model.
    CONCLUSION: The combination of nine baseline periodontal, metabolic, and demographic factors from patients with periodontitis and type 2 DM may indicate the response to periodontal therapy. Lower levels of full mouth PD, CAL, plaque index, and HbA1c at baseline increased the chances of achieving the endpoint for treatment at 6-month follow-up. However, all nine features included in the model should be considered for treatment outcome predictability. Clinicians may consider the characterization of periodontal therapy response to implement personalized care and treatment decision-making. Clinical trial registration ID: NCT02800252.
    DOI:  https://doi.org/10.1590/1678-7757-2025-0211
  12. Front Immunol. 2025 ;16 1574157
       Background: Kidney stones are a common benign condition of the urinary system, characterized by high incidence and recurrence rates. Our previous studies revealed an increased prevalence of kidney stones among diabetic patients, suggesting potential underlying mechanisms linking these two conditions. This study aims to identify key genes, pathways, and immune cells that may connect diabetes and kidney stones.
    Methods: We conducted bulk transcriptome differential analysis using our sequencing data, in conjunction with the AS dataset (GSE231569). After eliminating batch effects, we performed differential expression analysis and applied weighted gene co-expression network analysis (WGCNA) to investigate associations with 18 forms of cell death. Differentially expressed genes (DEGs) were subsequently analyzed using 10 commonly used machine learning algorithms, generating 101 unique combinations to identify the final DEGs. Functional enrichment analysis was performed, alongside the construction of protein-protein interaction (PPI) networks and transcription factor (TF)-gene interaction networks.
    Results: For the first time, bioinformatics tools were utilized to investigate the close genetic relationship between diabetes and kidney stones. Among 101 machine learning models, S100A4, ARPC1B, and CEBPD were identified as the most significant interacting genes linking diabetes and kidney stones. The diagnostic potential of these biomarkers was validated in both training and test datasets.
    Conclusion: We identified three biomarkers-S100A4, ARPC1B, and CEBPD-that may play critical roles in the shared pathogenesis of diabetes and kidney stones. These findings open new avenues for the diagnosis and treatment of these comorbid conditions.
    Keywords:  bioinformatics; diabetes; kidney stone; machine learning; programmed cell death
    DOI:  https://doi.org/10.3389/fimmu.2025.1574157
  13. Sci Rep. 2025 Jul 25. 15(1): 27164
      Type 2 Diabetes Mellitus (T2DM) remains a significant global health challenge, underscoring the need for early and accurate risk prediction tools to enable timely interventions. This study introduces ECG-DiaNet, a multimodal deep learning model that integrates electrocardiogram (ECG) features with established clinical risk factors (CRFs) to improve the prediction of T2DM onset. Using data from the Qatar Biobank (QBB), we compared ECG-DiaNet against unimodal models based solely on ECG or CRFs. A development cohort (n = 2043) was utilized for model training and internal validation, while a separate longitudinal cohort (n = 395) with a median five-year follow-up served as the test set. ECG-DiaNet demonstrated superior predictive performance, achieving a higher area under the receiver operating characteristic curve (AUROC) compared to the CRF-only model (0.845vs.0.8217), which was statistically significant based on the DeLong test (p < 0.001), thus highlighting the added predictive value of incorporating ECG signals. Reclassification metrics reinforced these improvements, with a significant Net Reclassification Improvement (NRI = 0.0153,p < 0.001) and Integrated Discrimination Improvement (IDI = 0.0482,p = 0.0099), confirming the enhanced risk stratification. Furthermore, stratifying participants into Low-, Medium-, and High-risk categories revealed that ECG-DiaNet achieved a higher positive predictive value (PPV) in the high-risk group compared to CRF-only models. These findings, together with the non-invasive nature and wide accessibility of ECG technology, suggest the potential of ECG-DiaNet for clinical implementation. However, further validation using larger and more diverse datasets is needed to improve generalizability.
    DOI:  https://doi.org/10.1038/s41598-025-12633-z
  14. PLoS One. 2025 ;20(8): e0328906
      Diabetic foot ulcer (DFU) is a severe complication of diabetes, often leading to amputation due to poor wound healing and infection. The immune-related pathogenesis of DFU remains unclear, and therapeutic drugs are limited. This study aimed to explore the immune mechanisms of DFU and identify potential therapeutic drugs using machine learning and single-cell approaches. Through differential expression analysis of Gene Expression Omnibus (GEO) datasets, we identified 287 differentially expressed genes (DEGs), which were significantly enriched in IL-17 signaling and neutrophil chemotaxis pathways. Weighted gene co-expression network analysis (WGCNA) further pinpointed disease-associated modules containing 1,693 regulatory genes. Machine learning algorithms prioritized seven core genes-CCL20, CXCL13, FGFR2, FGFR3, PI3, PLA2G2A, and S100A8-with validation in an external dataset GSE147890 and single-cell sequencing revealing their predominant expression in neutrophils and keratinocytes. Immune infiltration analysis demonstrated significant dysregulation in DFU patients, characterized by elevated proportions of memory B cells, M0 macrophages, activated mast cells, and neutrophils. Potential therapeutic compounds were identified using the Connectivity Map database and tested through molecular docking and dynamics simulations. The study pinpointed selegiline, L-BSO, flunisolide, PP-30, and fluocinolone as promising therapeutic agents, offering new insights into the pathogenesis of diabetic foot ulcers (DFU) and potential therapeutic strategies.
    DOI:  https://doi.org/10.1371/journal.pone.0328906