bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2025–10–19
thirteen papers selected by
Mott Given



  1. Sci Rep. 2025 Oct 14. 15(1): 35763
      Diabetic retinopathy (DR), a serious eye condition in diabetic patients, requires early and precise detection for effective treatment. Late diagnosis and poor blood sugar control exacerbate this condition, highlighting the need for improved diagnostic methods. We developed a novel algorithm combining advanced image processing with machine learning techniques, utilizing classifiers such as SVM, decision tree, logistic regression, and kNN. A key feature of our approach is the incorporation of Voronoi Diagrams, which enhances the algorithm's ability to analyze complex image patterns. The algorithm was tested on 800 eye (fundus) images. The decision tree-based classifier, a part of the algorithm, demonstrated high precision and reliability in predicting DR, achieving an AUC of 0.964. The integration of Voronoi Diagrams significantly improved accuracy and reliability across various classifiers. This study demonstrates that our algorithm, particularly the decision tree classifier, can diagnose DR with a level of accuracy comparable to established clinical benchmarks. The high AUC value confirms its effectiveness. Voronoi Diagrams notably enhanced the algorithm's performance, indicating a promising approach for refining AI tools in ophthalmology.
    Keywords:  Artificial Intelligence; Automated diagnosis; Diabetic retinopathy; Voronoi diagrams
    DOI:  https://doi.org/10.1038/s41598-025-87886-9
  2. JMIR AI. 2025 Oct 08. 4 e68260
       Background: Early diagnosis of diabetes is essential for early interventions to slow the progression of dysglycemia and its comorbidities. However, among individuals with diabetes, about 23% were unaware of their condition.
    Objective: This study aims to investigate the potential use of automated machine learning (AutoML) models and self-reported data in detecting undiagnosed diabetes among US adults.
    Methods: Individual-level data, including biochemical tests for diabetes, demographic characteristics, family history of diabetes, anthropometric measures, dietary intakes, health behaviors, and chronic conditions, were retrieved from the National Health and Nutrition Examination Survey, 1999-2020. Undiagnosed diabetes was defined as having no prior self-reported diagnosis but meeting diagnostic criteria for elevated hemoglobin A1c, fasting plasma glucose, or 2-hour plasma glucose. The H2O AutoML framework, which allows for automated hyperparameter tuning, model selection, and ensemble learning, was used to automate the machine learning workflow. For comparative analysis, 4 traditional machine learning models-logistic regression, support vector machines, random forest, and extreme gradient boosting-were implemented. Model performance was evaluated using the area under the receiver operating characteristic curve.
    Results: The study included 11,815 participants aged 20 years and older, comprising 2256 patients with undiagnosed diabetes and 9559 without diabetes. The average age was 59.76 (SD 15.0) years for participants with undiagnosed diabetes and 46.78 (SD 17.2) years for those without diabetes. The AutoML model demonstrated superior performance compared with the 4 traditional machine learning models. The trained AutoML model achieved an area under the receiver operating characteristic curve of 0.909 (95% CI 0.897-0.921) in the test set. The model demonstrated a sensitivity of 70.26%, specificity of 90.46%, positive predictive value of 64.10%, and negative predictive value of 92.61% for identifying undiagnosed diabetes from nondiabetes.
    Conclusions: To our knowledge, this study is the first to utilize the AutoML model for detecting undiagnosed diabetes in US adults. The model's strong performance and applicability to the broader US population make it a promising tool for large-scale diabetes screening efforts.
    Keywords:  AutoML; machine learning; screening; self-report; undiagnosed diabetes
    DOI:  https://doi.org/10.2196/68260
  3. Front Med (Lausanne). 2025 ;12 1657889
       Introduction: Timely and accurate diagnosis of diabetes mellitus remains a pending challenge due to the diversity of patient data and the limitations of traditional screening methods.
    Objective: To propose a hybrid prediction framework incorporating Convolutional Neural Networks (CNNs) and Integrated Learning with a soft voting strategy to improve the accuracy, robustness and interpretability of diabetes diagnosis.
    Methods: The model was evaluated on two publicly available datasets-the UCI Pima Indians Diabetes dataset (768 samples, 8 features), the same dataset used to describe the Pima Indians (2,000 samples, 8 features) and the Tianchi Medical dataset (5,642 samples, 41 features). After missing-value imputation, z-score standardization, and min-max normalization, CNNs were used for deep feature extraction, followed by integration with multiple classifiers-Logistic Regression (LR), Support Vector Machines (SVM), Random Forest, AdaBoost, XGBoost, LightGBM, and CatBoost-via a weighted soft voting scheme. Training and testing sets were split 75:25, and hyperparameters for each classifier were tuned through grid search.
    Results: The proposed CNN-Voting integrated model consistently outperforms the individual models, achieving up to 98% accuracy, 0.99 F1 value and 99% recall on the largest dataset. Feature importance analysis revealed that blood glucose, body mass index (BMI), age, and urea were the features with the most predictive value, which was highly consistent with common knowledge in clinical medicine.
    Conclusion: This hybrid model not only improves predictive performance and generalisability, but also provides a scalable and interpretable solution for clinical decision support in diabetes management.
    Keywords:  convolutional neural networks; diabetes; feature extraction; machine learning; soft voting
    DOI:  https://doi.org/10.3389/fmed.2025.1657889
  4. Sci Rep. 2025 Oct 16. 15(1): 36202
      This study aims to identify risk factors associated with diabetic peripheral neuropathy (DPN) in patients with type 2 diabetesmellitus (T2DM) and to develop a predictive model to support clinical decision-making. A total of 1,001 patients with T2DM were retrospectively enrolled from the Department of Endocrinology, First Affiliated Hospital of Xinjiang Medical University, between January 2023 and January 2024. All patients were residents of Xinjiang. Patients were divided into two groups according to the diagnosis of peripheral neuropathy: 603 patients with DPN and 398 without DPN (NDPN). Missing data were handled using the "VIM" and "mice" packages in R. Statistical analyses were performed using independent t-tests and chi-square tests. Two machine learning algorithms were used to identify key risk factors, and shared features were visualized using a Venn diagram. Subsequently, four diagnostic models-GBM, GLM, RF, and SVM-were constructed using the "caret" package, and their predictive performance was rigorously evaluated.Fifteen factors, including age, duration of diabetes mellitus (DM), body mass index (BMI), diastolic blood pressure (DBP), 2-h postprandial glucose (2hPG), total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), triglyceride-glucose (TyG) index, blood urea, estimated glomerular filtration rate (eGFR), urinary uric acid, urinary creatinine, urinary microalbumin, and urinary albumin-to-creatinine ratio (UACR), were significantly associated with the occurrence of diabetic peripheral neuropathy (DPN), all showing statistical significance (P < 0.05).SVM-RFE and LASSO regression identified seven core risk factors for model construction. The RF model achieved the best performance, with AUCs of 1.000 (95% CI 0.990-1.000) in the training set, 0.904 (95% CI 0.869-0.940) in the validation set, and 0.953 (95% CI 0.940-0.966) in the external dataset. To assess potential overfitting in the Random Forest model, we performed model simplification, bootstrap resampling to estimate confidence intervals, and DeLong's test (P = 0.748), all of which confirmed that the model maintained robust generalization performance rather than merely fitting the training data. This study successfully identified significant predictors of DPN using machine learning techniques and developed a validated diagnostic model. The model demonstrated high accuracy and may aid in early detection and clinical management of DPN.
    Keywords:  Diabetic peripheral neuropathy; Diagnostic efficacy; Machine learning; Random forest model; Risk prediction
    DOI:  https://doi.org/10.1038/s41598-025-19922-7
  5. Rev Endocr Metab Disord. 2025 Oct 14.
      Diabetic foot ulcers (DFUs) are among the most serious complications of diabetes mellitus, often resulting in infection, amputation, and increased mortality. Early detection is essential but remains difficult due to the complex interaction of neuropathy, vascular disease, and immune dysfunction. This review examines the effectiveness of thermal imaging, including approaches supported by artificial intelligence (AI), as a non-invasive tool for identifying early signs of DFUs. A total of 49 studies published between 1991 and 2024 were analysed, focusing on adult patients and primary research only. Findings show that thermal imaging can detect abnormal skin temperature patterns and early inflammation, key indicators of DFU development. AI techniques, such as machine learning and neural networks, further enhance diagnostic accuracy by identifying subtle patterns and predicting ulcer risk. Despite promising results, several limitations were noted: lack of standardised imaging protocols, inconsistent equipment quality, and small sample sizes in many studies. To improve clinical reliability, future work should focus on developing standard procedures, integrating AI with high-resolution thermal cameras, and validating these systems in real-world hospital and home-care settings. Overall, thermal imaging, especially when combined with AI, shows strong potential as a practical, non-invasive method for early DFU detection and monitoring.
    Keywords:  Diabetes mellitus; Diabetic foot ulcers; Diagnosis; Thermal imaging
    DOI:  https://doi.org/10.1007/s11154-025-09999-w
  6. J Voice. 2025 Oct 13. pii: S0892-1997(25)00400-X. [Epub ahead of print]
       OBJECTIVE: To develop and validate a multimodal, machine learning-based framework that integrates acoustic voice features with baseline clinical parameters for noninvasive and accurate screening of type 2 diabetes mellitus (T2DM).
    MATERIALS AND METHODS: We analyzed data from 3129 individuals, including 1158 with T2DM and 1971 without. Voice recordings were collected under standardized conditions and processed with the openSMILE toolkit to extract 88 acoustic features, encompassing prosodic, spectral, cepstral, and quality-related parameters. In parallel, 30 clinical features were obtained from demographic, anthropometric, biochemical, lifestyle, and medical history variables. After preprocessing and imputation, feature selection was conducted using LASSO, ANOVA, Mutual Information, and Recursive Feature Elimination. Dimensionality reduction with Principal Component Analysis was also evaluated. Models, including Logistic Regression, Random Forest, XGBoost, TabNet, and TabTransformer, were trained with cross-validation and tuned through grid and randomized searches. Performance was assessed on an independent test set using accuracy, recall, and area under the curve (AUC). Model interpretability was addressed via SHAP analysis, t-SNE visualization, and radar plots. Clinical utility was assessed with nomogram construction, calibration, and decision curve analysis (DCA).
    RESULTS: Models using clinical features alone achieved moderate performance (AUC ≈ 69%). Acoustic-only models performed better, with the LASSO + XGBoost combination reaching an AUC of 80.8%. The fused feature set markedly outperformed both unimodal approaches, with the LASSO + XGBoost model achieving 94.1% accuracy, 93.6% recall, and an AUC of 95.2%. SHAP analysis identified HbA1c, fasting glucose, HOMA-IR, and acoustic markers such as jitter and shimmer as top predictors. Calibration plots showed excellent agreement between predicted and observed probabilities, while DCA demonstrated superior net clinical benefit.
    CONCLUSIONS: Our multimodal framework provides an accurate, interpretable, and clinically actionable approach for noninvasive T2DM screening. Future studies should validate these findings in diverse populations and explore integration into real-world digital health platforms.
    Keywords:  Acoustic analysis; Decision curve analysis; Machine learning; Nomogram; Type 2 diabetes mellitus; Voice biomarkers
    DOI:  https://doi.org/10.1016/j.jvoice.2025.09.033
  7. J Med Syst. 2025 Oct 14. 49(1): 139
      This study presents a machine learning-driven model predicting all-cause mortality two years in advance using administrative health data focused on diabetic patients. Integrating hospitalization records, emergency department data, demographics, and chronic disease information for 1553 variables, the study utilizes XGBoost, achieving an AUC of 0.89, which comparatively surpasses existing models. The research emphasizes the machine learning model's efficacy in capturing intricate mortality risk relationships and highlighting risk factors. While prior models often relied on specific cohorts or limited variables, this model, based on commonly available variables in primary care data, displays robust discrimination and calibration. Additionally, it highlights significant predictors such as age, immigration status, diagnosis age of comorbidities, number of comorbidities, and durations of comorbidities, aiding in early risk identification. The study suggests a potential for enhanced patient management and resource allocation based on mortality risk predictions for diabetic populations, showcasing the impact of machine learning in healthcare.
    Keywords:  Diabetes mellitus; EHR data; Machine learning; Mortality; XGBoost
    DOI:  https://doi.org/10.1007/s10916-025-02278-w
  8. Endocr Connect. 2025 Oct 14. pii: EC-25-0353. [Epub ahead of print]
       Background: Type 2 diabetes mellitus (T2DM) poses a significant global public health burden, where early detection of at-risk populations is imperative for implementing targeted preventive strategies. This systematic review and meta-analysis aimed to evaluate the methodological quality and predictive performance of existing T2DM risk prediction models in screening contexts.
    Methods: Following the TRIPOD-SRMA statement, eligible studies were selected in the study through searching seven databases (CNKI, WanFang Database, VIP, PubMed, Embase, Web of Science, and the Cochrane Library) from database inception through December 2024. Methodological quality was assessed using the PROBAST tool. Random-effects models synthesized discrimination (AUC). Subgroup analyses explored geographic, modeling, and validation-related heterogeneity. Funnel plots and Egger's regression test assessed small-study effects.
    Results: A total of 65 studies (encompassing 97 distinct prediction models) were included in the analysis. Among 97 models, logistic regression dominated (97.9% of models), achieving moderate discrimination (AUC: 0.628-0.916), while machine learning (ML) models showed marginally higher AUCs (up to 0.998). Geographic and cohort disparities emerged, with USA-based models outperforming others (USA AUC: 0.97 vs. China AUC: 0.79) and poor performance in prediabetic cohorts (AUC: 0.72 vs. 0.80 in normoglycemic). External validation remained limited (21 models), though spatial/temporal validation cohorts demonstrated stable performance. High risk of bias and application (> 80% of models) stemmed from inadequate statistical reporting and external verification definitions.
    Conclusion: ML has a favorable diagnostic accuracy for the progression of T2DM. This provides evidence for the development of predictive tools with broader applicability. Future research should prioritize external validation to enhance precision.
    Keywords:  machine learning; meta-analysis; risk prediction models; screening; type 2 diabetes mellitus
    DOI:  https://doi.org/10.1530/EC-25-0353
  9. BMC Med Ethics. 2025 Oct 17. 26(1): 140
       BACKGROUND: Artificial intelligence (AI) offers significant potential to drive advancements in healthcare; however, the development and implementation of AI models present complex ethical, legal, social, and technical challenges, as data practices often undermine regulatory frameworks in various regions worldwide. This study explores stakeholder perspectives on the development and deployment of AI algorithms for diabetic retinopathy (DR) screening, with a focus on ethical risks, data practices, governance, and emerging shortcomings in the Global South AI discourse.
    METHODS: Fifteen semi-structured interviews were conducted with ophthalmologists, program officers, AI developers, bioethics experts, and legal professionals. Thematic analysis was guided by OECD principles for responsible AI stewardship. Interviews were analyzed using MAXQDA software to identify themes related to AI trustworthiness and ethical governance.
    RESULTS: Six key themes emerged regarding the perceived trustworthiness of AI: algorithmic effectiveness, responsible data collection, ethical approval processes, explainability, implementation challenges, and accountability. Participants reported critical shortcomings in AI companies' data collection practices, including a lack of transparency, inadequate consent processes, and limited patient awareness about data ownership. These findings highlight how unchecked data collection and curation practices may reinforce data colonialism in low and middle-income healthcare systems.
    CONCLUSION: Ensuring trustworthy AI requires transparent and accountable data practices, robust patient consent mechanisms, and regulatory frameworks aligned with ethical and privacy standards. Addressing these issues is vital to safeguarding patient rights, preventing data misuse, and fostering responsible AI ecosystems in the Global South.
    Keywords:  Artificial intelligence; Diabetic retinopathy screening; Trustworthiness
    DOI:  https://doi.org/10.1186/s12910-025-01265-7
  10. Sci Rep. 2025 Oct 15. 15(1): 35950
      Diabetic retinopathy (DR) is a foremost cause of vision impairment, characterized by the presence of microaneurysms, exudates, and other intricate retinal lesions. Traditional image segmentation methods often encounter challenges with the complex features due to poor boundary detection, noise sensitivity, and difficulties with low-contrast images. These limitations result in suboptimal segmentation, potentially compromising accurate diagnosis. This study introduces a hybrid segmentation approach that integrates K-Means clustering with Graph Cut optimization. K-Means clustering provides an initial segmentation by categorizing the image into distinct regions, while Graph Cut refines these regions by optimizing boundary delineations based on pixel similarity and spatial continuity. This integration effectively overcomes the noise sensitivity and boundary detection issues of traditional methods, especially in low-contrast scenarios. The proposed method has been verified on standard datasets, achieving a Mean Squared Error (MSE) of 0.044, a Peak Signal-to-Noise Ratio (PSNR) of 40.84, and a Structural Similarity Index (SSIM) of 0.92. These results significantly surpass recent benchmarks reported in the literature, where existing methods typically achieve PSNR values between 30 and 35 dB, MSE values between 0.038 ± 0.007 and SSIM values around 0.85 to 0.90. The improvements in PSNR and SSIM underscore the superior image quality and structural preservation offered by the approach. By merging the strengths of K-Means and Graph Cut, the proposed hybrid method provides a robust, scalable, and computationally efficient solution for retinal image segmentation, enhancing the early detection of DR and supporting global ophthalmic care initiatives. By leveraging this efficient algorithm, the proposed work promotes innovation in healthcare technology with focus on Sustainable Development Goal 9, ensuring accessible, accurate, and scalable solutions for ophthalmic care globally.
    Keywords:  Diabetic retinopathy; Graph cut; K-means; MSE; PSNR; Retinal image; SSIM
    DOI:  https://doi.org/10.1038/s41598-025-89262-z
  11. Int J Biomed Imaging. 2025 ;2025 6154285
      Various retinal conditions, such as diabetic macular edema (DME) and choroidal neovascularization (CNV), pose significant risks of visual impairment and vision loss. Early detection through automated and accurate and advanced systems can greatly enhance clinical outcomes for patients as well as for medical staff. This study is aimed at developing a deep learning-based model for the early detection of retinal diseases using OCT images. We utilized a publicly available retinal image dataset comprising images with DME, CNV, drusen, and normal cases. The Inception model was trained and validated using various evaluation metrics. Performance metrics, including accuracy, precision, recall, and F1 score, were calculated. The proposed model achieved an accuracy of 94.2%, with precision, recall, and F1 scores exceeding 92% across all classes. Statistical analysis demonstrated the robustness of the model across folds. Our findings highlight the potential of AI-powered systems in improving early detection of retinal conditions, paving the way for integration into clinical workflows. More efforts are needed to utilize it offline by making it available on ophthalmologist mobile devices to facilitate the diagnosis process and provide better service to patients.
    DOI:  https://doi.org/10.1155/ijbi/6154285
  12. NPJ Digit Med. 2025 Oct 16. 8(1): 612
      Continuous prediction of glucose levels and hypoglycemia events is critical for managing type 1 diabetes mellitus (T1DM) under intensive insulin therapy. Existing models focus on a single task, limiting their practicality and adaptability in automated insulin delivery (AID) systems. To address this, a domain-agnostic continual multi-task learning (DA-CMTL) framework that simultaneously performs glucose level forecasting and hypoglycemia event classification within a unified framework is proposed. Trained on simulated datasets via Sim2Real transfer and adapted using elastic weight consolidation, DA-CMTL supports cross-domain generalization. Evaluation on public datasets (DiaTrend, OhioT1DM, and ShanghaiT1DM) yielded a root mean squared error of 14.01 mg/dL, mean absolute error of 10.03 mg/dL, and sensitivity/specificity of 92.13%/94.28% on 30 min prediction. Real-world validation using diabetes-induced rats demonstrated a reduction in time below range from 3.01% to 2.58%, supporting reliable integration as a safety layer in AID systems. These results highlight DA-CMTL's robustness, scalability, and potential to improve safety in AID.
    DOI:  https://doi.org/10.1038/s41746-025-01994-4