bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2025–09–07
eighteen papers selected by
Mott Given



  1. BMC Biomed Eng. 2025 Sep 02. 7(1): 12
      Diabetic retinopathy (DR) stands as a leading cause of global blindness. Early identification and prompt treatment are crucial in preventing vision impairment caused by diabetic retinopathy (DR). Manual screening of retinal fundus images is challenging and time-consuming. Additionally, there is a significant gap between the number of DR patients and the number of medical experts. Integrating machine learning (ML) and deep learning (DL) techniques is becoming a viable alternative to traditional DR screening techniques. However, the absence of a retinal dataset with standardized quality, the complexity of DL models, and the need for high computational resources are challenges. Therefore, in this study, we studied and analyzed the research landscape in integrating ML techniques in DR screening. In this regard, our work contributes significantly in several aspects. Initially, we identify and characterize images of the retinal fundus that are readily available. Then, we discuss commonly used preprocessing techniques in DR screening. In addition, we analyze the progress of ML techniques in DR screening. Lastly, we discussed existing challenges and showed future directions.
    Keywords:  Computer vision; Deep learning; Diabetic retinopathy screening; Machine learning; Transfer learning
    DOI:  https://doi.org/10.1186/s42490-025-00098-0
  2. Clin Ophthalmol. 2025 ;19 2889-2900
       Purpose: Due to the high incidence rate of eye diseases, various artificial intelligence (AI) screening systems for retinal eye disorders have been developed at present. This study aimed to evaluate the diagnostic performance and clinical value of an AI-assisted system for large-scale screening of diabetic retinopathy (DR) and other fundus abnormalities in a real-world physical examination population.
    Methods: This retrospective study analyzed 54,353 fundus examination records collected from the local hospital in 2020. An AI-assisted system was used to screen for DR and other retinal abnormalities. Manual interpretation was conducted to validate AI predictions, and data were stratified by comorbidities and systemic risk factors.
    Results: Approximately 25% of individuals tested positive for fundus lesions. The AI-assisted system demonstrated high diagnostic performance, with a negative predictive value ≥96% and a positive predictive value ≥90%. Common abnormalities detected included retinal vascular sclerosis, drusen, maculopathy, optic cup enlargement, and hemorrhage. Higher positive detection rates were observed in individuals with a history of diabetes, hypertension, high myopia, and other systemic conditions, with detection rates increasing with disease duration.
    Conclusion: AI-assisted screening offers an effective, scalable approach for early DR detection and can also identify systemic diseases with retinal manifestations. Integration of AI with big data platforms enables timely intervention, especially in underserved areas. Building a multi-institutional DR data platform may revolutionize retinal disease management and improve patient outcomes. This study supports the clinical application of AI in enhancing diagnostic efficiency and targeting high-risk populations for early intervention.
    Keywords:  artificial intelligence; deep learning; diabetic retinopathy; early detection; fundus screening
    DOI:  https://doi.org/10.2147/OPTH.S538020
  3. Int J Cardiol. 2025 Aug 27. pii: S0167-5273(25)00871-X. [Epub ahead of print]442 133828
      
    Keywords:  Biomarkers; Diabetes; Heart failure; Inflammation; Machine learning
    DOI:  https://doi.org/10.1016/j.ijcard.2025.133828
  4. J Am Podiatr Med Assoc. 2025 Jul-Aug;115(4):pii: 23-137. [Epub ahead of print]115(4):
      
    DOI:  https://doi.org/10.7547/23-137
  5. PLoS One. 2025 ;20(9): e0328655
       BACKGROUND: Diabetes remains a major public health concern in the United States, with a complex interplay of behavioral, demographic, and clinical risk factors. This study aims to identify the three best-performing machine learning models for diabetes risk prediction and to visualize the most influential predictors affecting diabetes likelihood. By leveraging a large, representative dataset, the study contributes to evidence-based strategies for targeted prevention.
    METHODS: Data were obtained from the 2015 Behavioral Risk Factor Surveillance System (BRFSS), a nationally representative, population-based survey collecting information on health behaviors, chronic conditions, and preventive care. The analytical sample included 253,680 adult respondents and over twenty features encompassing sociodemographic variables (e.g., age, sex, race, income, education), health behaviors (e.g., smoking, physical activity, diet), and outcomes (e.g., BMI, hypertension, diabetes status). Eighteen machine learning models were trained and evaluated, including AdaBoost, Extra Trees Classifier, C5.0 Decision Tree, and CatBoost. Models were assessed using predictive accuracy and AUC scores. SHAP (SHapley Additive exPlanations) analysis was used to interpret the top model and examine how changes in key features influence diabetes risk.
    RESULTS: Among the evaluated models, the Extra Trees Classifier achieved the highest predictive accuracy (>90%) and an AUC of 0.99. AdaBoost and CatBoost also demonstrated strong performance. Feature importance analysis identified BMI, age, general health status, income, physical health days, and education as the top predictors. A nonlinear association between income and diabetes risk was observed, with the highest prevalence in individuals earning $20,000-$25,000. Risk was also elevated in individuals aged 65-69 and those reporting poor general health. Hypertension showed a strong positive correlation with diabetes risk.
    CONCLUSIONS: Machine learning models, particularly tree-based ensemble methods, offer robust tools for diabetes risk prediction. These findings support their integration into public health analytics for personalized risk assessment and data-driven prevention strategies.
    DOI:  https://doi.org/10.1371/journal.pone.0328655
  6. World J Methodol. 2025 Dec 20. 15(4): 107166
      Artificial intelligence (AI), encompassing machine learning and deep learning, is being extensively used in medical sciences. It is slated to positively impact the diagnosis and prognostication of various diseases. Deep learning, a subset of AI, has been instrumental in diagnosing diabetic retinopathy (DR), diabetic macular edema, glaucoma, age-related macular degeneration, and numerous other ocular diseases. AI performs equally well in the early prediction of glaucoma and age-related macular degeneration. Integrating AI with telemedicine promises to improve healthcare delivery, although challenges persist in implementing AI algorithms, especially in developing countries. This review provides a comprehensive summary of AI, its applications in ophthalmology, particularly DR, the diverse algorithms utilized for different ocular conditions, and prospects for the future integration of AI in eye care.
    Keywords:  Age-related macular degeneration; Alzheimer's disease; Artificial intelligence; Automatic retinal image analysis; Chronic kidney disease; Convolutional neural networks; Diabetic macular edema; Diabetic retinopathy; International council of ophthalmology; Machine learning; Massive training artificial neural networks; Natural language processing; OCT angiography; Optical coherence tomography; Vision transformers
    DOI:  https://doi.org/10.5662/wjm.v15.i4.107166
  7. Ophthalmol Sci. 2025 Nov-Dec;5(6):5(6): 100874
       Purpose: To develop a machine learning (ML) algorithm capable of determining cardiovascular (CV) risk in multimodal retinal images from patients with type 1 diabetes mellitus (T1DM), distinguishing between moderate, high, and very high-risk levels.
    Design: Cross-sectional analysis of a retinal image data set from a previous prospective OCT angiography (OCTA) study (ClinicalTrials.gov NCT03422965).
    Participants: Patients with T1DM included in the progenitor study.
    Methods: Radiomic features were extracted from color fundus photographs (CFPs), OCT, and OCTA images, and ML models were trained using these features either individually or combined with clinical data (demographics and systemic data, OCT + OCTA commercial software metrics, ocular data, blood data). Different data combinations were tested to determine the CV risk stages, defined according to international classifications.
    Main Outcome Measures: Area under the receiver operating characteristic curve mean and standard deviation for each ML model and each data combination.
    Results: A data set of 597 eyes (359 individuals) was analyzed. Models trained only with the radiomic features achieved area under the curve (AUC) values of (0.79 ± 0.03) to identify moderate risk cases from high and very high-risk cases, and (0.73 ± 0.07) for distinguishing between high and very high-risk cases. The addition of clinical variables improved all AUC values, obtaining (0.99 ± 0.01) for identifying moderate cases, and (0.95 ± 0.02) for differentiating between high and very high-risk cases. For very high CV risk, radiomics combined with OCT + OCTA metrics and ocular data achieved an AUC of (0.89 ± 0.02) without systemic data input. The performance of the models was similar in unilateral and bilateral eye image data sets.
    Conclusions: Radiomic features obtained from retinal images are helpful to discriminate and classify CV risk labels, differentiating risk categories. The addition of demographics and systemic data combined with ocular data differentiate high from very high CV risk cases, and interestingly OCT + OCTA metrics with ocular data identify very high CV risk cases without systemic data input. These results reflect the potential of this oculomics approach for CV risk assessment.
    Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
    Keywords:  Cardiovascular risk; Diabetes mellitus type I; Machine learning; Optical coherence tomography angiography; Radiomics
    DOI:  https://doi.org/10.1016/j.xops.2025.100874
  8. J Med Eng Technol. 2025 Sep 05. 1-14
      Diabetic retinopathy is a chronic and progressive eye disease in which the human retina is affected by an increase in the amount of insulin in the blood. Diabetic retinopathy, if not detected and treated in time, threatens the patient's vision and eventually causes complete blindness. Among various clinical symptoms, microaneurysm appears as the first sign of diabetic retinopathy. Accurate and reliable detection of microaneurysms is a challenging problem due to its small size and low contrast. The successful detection of microaneurysms will be more useful for the proper treatment of the disease in its early stages. In this paper, we present a method for classifying medical images of the retina to accurately detect the level of development of diabetic retinopathy. Our proposed method has six main steps. In steps one to four, the input image is pre-processed. In the first step; the detection and segmentation of blood vessels using the morphological closing operation is done. The second step; performs circular edge detection using gradient morphological operation. The third step; optical disc detection using the circular Hough transform edge detection method is done. The fourth step; the detection and segmentation of microaneurysms is done by removing blood vessels, circular edges, and optical discs and we use circular Hough transformation. In the fifth step, feature extraction is performed by considering two features, blood vessel area and microaneurysm area, and four features obtained from the gray level co-occurrence matrix. Finally, the sixth step is classification using the SVM classifier (Gaussian kernel function). We evaluated the performance of the model using EyePacs retinal fundus image database and obtained 95.20% and 97% accuracy and specificity, respectively. Experimental results show that our proposed model performs better in terms of evaluated measures compared to other methods.
    Keywords:  Diabetic retinopathy; SVM; intelligent model; micro aneurysm; morphological; retinal image
    DOI:  https://doi.org/10.1080/03091902.2025.2553137
  9. Sci Rep. 2025 Sep 02. 15(1): 32280
      Diabetic Retinopathy (DR) is a leading cause of blindness worldwide, and its early detection and accurate grading play a crucial role in clinical intervention. To address the dual limitations of existing methods in multi-scale lesions feature fusion and lesions relation modeling, this study proposes a novel adaptive multi-scale convolutional neural network model for fine-grained grading of DR, called MAFNet (Multi-scale Adaptive Fine-grained Network). The model is constructed through three core modules to establish a multi-scale feature integration framework: the Hierarchical Global Context Module (HGCM) effectively expands the receptive field by employing multi-scale pooling and dynamic feature fusion, capturing lesions features from micro to large-scale areas; the Multi-scale Adaptive Attention Module (MSAM) utilizes an adaptive attention mechanism to dynamically adjust the feature weights at different spatial locations, enhancing the representation of key lesions regions; and the Relational Multi-head Attention Module (RMA) uses a multi-head attention mechanism to model the complex relationships between features in parallel, improving the accuracy of fine-grained lesions identification. Furthermore, MAFNet adopts a multi-task learning framework, transforming the DR grading task into a dual-task structure of regression and classification, thereby effectively capturing the progression of DR. Extensive experiments on three publicly available datasets, DDR, Messidor-2, and APTOS, show that the quadratic weighted Kappa values of the MAFNet model reach 0.934, 0.917, and 0.936, respectively, significantly outperforming existing DR grading methods such as LANet and MPLNet, demonstrating its significant application value in automated DR grading.
    Keywords:  Adaptive multi-scale model; Diabetic retinopathy; Fine-grained grading; Multi-task learning
    DOI:  https://doi.org/10.1038/s41598-025-17158-z
  10. Diabetes Res Clin Pract. 2025 Aug 30. pii: S0168-8227(25)00460-7. [Epub ahead of print]228 112446
      Artificial intelligence (AI) enhances thermal image analysis by providing advanced pattern recognition and improving the accuracy of diabetic foot condition detection. AI-driven thermography systems support clinicians, but research on AI for diabetic foot thermography is fragmented, with diverse algorithms and existing reviews focusing mainly on statistical performance. This review aimed to provide a comprehensive review of AI-based diabetic foot thermography, with a focus on condition detection, performance metrics, clinical implications, and existing research gaps. A scoping review was conducted using PubMed, MEDLINE, CINAHL, ScienceDirect, Scopus, and Google Scholar, with keywords "diabetic foot temperature," "thermal imaging," and "artificial intelligence," including related MeSH terms. Eligible studies included original research and conference proceedings on AI-based foot thermography for diagnosing or monitoring diabetic adults. Literature reviews and meta-analyses were excluded. Sixty articles were reviewed. Most studies addressed increased temperature, followed by decreased temperature, and DFU severity classification pattern. AI performance ranged from 61% to 100%. Study environments were 46.67% controlled, 6.67% uncontrolled, and 46.67% unreported. AI applications included clinical decision support, remote monitoring, and reducing clinician workload. AI has advanced diabetic foot detection; however, additional studies in uncontrolled environments are needed to improve accuracy and enhance generalizability under real-world conditions.
    Keywords:  Artificial intelligence; Diabetic foot; Diabetic foot ulcer; Image analysis; Thermal image
    DOI:  https://doi.org/10.1016/j.diabres.2025.112446
  11. Oncology. 2025 Sep 04. 1-35
       INTRODUCTION: Our study aimed to identify risk factors associated with the survival of gastric cancer patients with Type 2 diabetes mellitus (T2DM) and create a risk-scoring system for predicting their survival probabilities.
    METHODS: We gathered data from 1,912 individuals with both gastric cancer and T2DM from the Hong Kong Hospital Authority Data Collaboration Laboratory (HADCL), spanning from 2000 to 2020. We used conventional Cox proportional hazards regression and tree-based machine learning algorithms to construct models for prognosis risk prediction. In the best-performing model, risk factors were identified using SHAP (Shapley Additive Explanations) analysis, and the AutoScore-Survival package was used to develop a risk-scoring system.
    RESULTS: Our findings indicate that older age at cancer diagnosis, longer duration of T2DM, higher body mass index (BMI), central obesity, lower levels of high-density lipoprotein cholesterol (HDL-C), and reduced serum potassium (serum-K) are associated with poorer prognosis for gastric cancer in patients with T2DM. The Random Survival Forests (RSF) model exhibited the best performance, achieving an AUC of 0.870 and a C-index of 0.78. Additionally, we developed two risk-scoring systems using predefined and tuned models, which yielded C-indices of 0.672 and 0.654, respectively, in the test set.
    CONCLUSION: This study enhances our understanding of gastric cancer prognosis in patients with T2DM by identifying significant risk factors and developing risk scoring systems. Further research is needed to elucidate the underlying mechanisms of these risk factors and to validate the risk scoring systems in clinical settings.
    DOI:  https://doi.org/10.1159/000548220
  12. Cureus. 2025 Jul;17(7): e89093
      Artificial intelligence (AI) holds significant promise for improving pediatric diabetes management, but its clinical adoption hinges on transparency and validity. Despite growing interest in AI applications, systematic evaluations of these critical aspects remain scarce. This systematic review examines the transparency and validity of AI applications in pediatric diabetes, assessing methodological rigor, reporting standards, and clinical readiness. Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, we searched Scopus, PubMed, Institute of Electrical and Electronics Engineers (IEEE) Xplore, Web of Science, and Embase for studies employing AI in pediatric diabetes. Ten studies met the inclusion criteria after screening 308 records. Data were extracted on AI methodologies, transparency indicators, and validation approaches. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool. Included studies addressed diverse AI applications, including glucose prediction, hypoglycemia risk assessment, and insulin dosing optimization. Transparency varied widely: 60% of studies disclosed algorithm details, while others omitted critical methodological information. Validation methods ranged from in silico (computer-based) simulations to independent cohorts, but only 30% incorporated external validation. Performance metrics included area under the curve (AUC) and clinical accuracy. Risk of bias was low in 60% of studies, though concerns arose from algorithmic opacity and small validation cohorts. While AI demonstrates potential in pediatric diabetes, inconsistent transparency and insufficient validation limit clinical translation. Future research must prioritize standardized reporting, multicenter validation, and diverse populations to ensure reliability and equity.
    Keywords:  artificial intelligence; machine learning; pediatric diabetes; systematic review; transparency; validity
    DOI:  https://doi.org/10.7759/cureus.89093
  13. Diabetes Obes Metab. 2025 Sep 01.
       BACKGROUND AND AIMS: Estimating the risk of cardiovascular disease (CVD) complications in type 2 diabetes mellitus (T2DM) patients is critical in the medical decision-making process. This study aimed to use a machine learning technique combined with proteomics to develop personalized models for predicting CVD in patients with T2DM.
    METHODS AND RESULTS: In total, 874 patients with T2DM and 2,920 Olink proteins obtained from the UK Biobank were used in this study. Proteins were screened using Cox regression and LASSO regression. A basic model containing clinical features and a full model combining proteome and clinical features were constructed using the random survival forest algorithm. The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the predictive performance of the models and compare them with other CVD predictive models. Compared with the basic model, the full model performed better in predicting CVD, with time-dependent AUCs of 0.81 (3 years), 0.74 (5 years) and 0.74 (10 years) (0.77, 0.69 and 0.67). We calculated the risk scores of the Framingham, ASCVD and Score2-Diabetes models. The results revealed that the prediction performance of the full model was also better than that of the abovementioned models. In terms of differentiation accuracy, the results of the net reclassification improvement index and integrated discrimination improvement index showed that the full model can identify high-risk individuals more accurately (accuracy rate: 79% vs. 69%).
    CONCLUSIONS: Proteomics can be used to predict cardiovascular complications in diabetic patients. It is also necessary to consider the applicability of the model due to the limitations of the sample size and the constraints of proteomics in clinical applications.
    Keywords:  Olink protein; T2DM; UK biobank; cardiovascular disease; machine learning
    DOI:  https://doi.org/10.1111/dom.70064
  14. Sci Rep. 2025 Sep 01. 15(1): 32045
      Diabetes is a chronic disorder that disrupts the body's ability to regulate blood glucose (BG) levels, leading to dangerous fluctuations such as hypoglycemia and hyperglycemia. In managing Type 1 Diabetes (T1D), the Dual Hormone Artificial Pancreas (DHAP) has emerged as a promising solution for maintaining optimal BG levels by administering both insulin and glucagon. However, the major challenges in DHAPs are slow dynamics in glucose sensing and delayed insulin absorption. In this paper, a Smart Dual Hormone Artificial Pancreas (SDHAP) with Event-triggered Feed-Back (FB)-Feed Forward (FF) control schemes are proposed to control the BG level of diabetic individuals and reject external disturbance due to food intake or exercise. Firstly, the classification of blood glucose level was performed with features extracted from the T1DiabetesGranada dataset using Machine Learning (ML) algorithms like K-Nearest Neighbor (KNN) and Support Vector Machine (SVM), and BG levels were predicted using time-series analysis. Secondly, the Event -Triggered Proportional-Integral feedback controllers: Proportional Integral (PI) and Model Predictive Control are designed based on the Bergman Minimal Model (BMM) model to deliver appropriate hormones namely insulin/glucagon based on predicted results. Finally, the FF controller was designed to reject external disturbances under hypoglycemia and hyperglycemia conditions. The results show the proposed SDHAP is more effective in controlling blood glucose by delivering patient-specific drugs with appropriate dosages based on individualized pathological conditions of T1D patients.
    Keywords:  Bergman minimum model; Blood glucose control; Event-triggered control; Feedforward/feedback control; Machine learning; SDHAP; Type 1 diabetes (T1D)
    DOI:  https://doi.org/10.1038/s41598-025-18085-9
  15. J Am Med Inform Assoc. 2025 Sep 03. pii: ocaf132. [Epub ahead of print]
       BACKGROUND: Negative descriptors in electronic health records (EHR) contribute to worse health outcomes; studies show they are also more prevalent in EHRs of women and racial minorities and affect downstream research biases. Similar and unique patterns of negative descriptors may also exist in the records of blind patients, including those with diabetic retinopathy. Diabetic retinopathy is a preventable but leading cause of blindness in the US that is disproportionally high among women and racial and ethnic minorities.
    METHODS: Using EHR from a large medical center, we created "matched" cohorts of patients with a type 2 diabetes-only diagnosis and patients with a diagnosis of diabetic retinopathy. We identified previously used and new, disability and patient-related negative descriptors and assessed patterns of biased language in the EHR, comparing patients by retinopathy diagnosis (yes/no), and changes in patterns of language usage pre- and post- the retinopathy diagnosis. We also assessed differences between patients with type 2 diabetes at the intersection of blindness (ie, retinopathy diagnosis) and self-reported gender and race and ethnicity marginalization.
    RESULTS: The EHRs of patients with diabetic retinopathy were significantly more likely than those of patients with diabetes-only diagnoses to contain biased language, across queried negative descriptors. The biasing language was consistently more prevalent in EHRs of patients with diabetic retinopathy identifying as women, Black/African Americans and Hispanic compared to White men and more likely to occur following patients' retinopathy diagnosis.
    CONCLUSIONS: Our study indicates the presence of both disability- and intersectional biases in EHRs. We discuss findings' implications and suggest steps to address them.
    Keywords:  Artificial Intelligence/Machine Learning (AI/ML); bias; disability; medical records; negative descriptors
    DOI:  https://doi.org/10.1093/jamia/ocaf132
  16. PLoS One. 2025 ;20(9): e0330669
      Diabetic Foot Ulcer (DFU) is a major complication of diabetes which needs early detection to help in timely treatment for preventing future serious consequences. Due to peripheral neuropathy, high blood glucose levels, and untreated wounds, DFUs can cause the disintegration of the skin and exposing the tissue below it, if not adequately treated. Recently deep learning (DL) has advanced and has shown its ability to automate DFU detection and classification by analysing medical images. The use of DL has been proven to be very useful for healthcare professionals, enabling earlier diagnosis and effective treatment of DFU. However, most of the studies predominantly rely on a single dataset (e.g., DFUC2021 or DFUC2020) without external validation or cross-dataset testing, raising concerns about generalizability and trustworthiness. The aim of this study is to develop a robust, reliable, and transparent DFU detection framework which is not only good performing but also can effectively give attention to the proper region of the images which are crucial for DFU detection. So, to make DFU detection robust, reliable in a single study, we proposed a custom approach, DFU_DIALNet and to enhance transparency and interpret the model decisions in this study, we integrated Grad-CAM and LIME heatmaps to precisely localize ulcer regions. This allows visual verification of the model's focus and clarifies the decision-making process, thereby increasing the model's reliability. DFU_DIALNet outperforms all other traditional models with 99.33% accuracy, 99% F1 score, and 100% AUC score, and compared it to other DL models-DenseNet121, MobileNetV2, InceptionV3, EfficientNetB0, ResNet50V2 and VGG16-in the merged dataset of DFUC2021 with our collected 500 images. We have checked our model's reliability with 2 other popular datasets--the KDFU and DFUC2020 datasets, where our proposed approach gives the highest accuracy of 95.61% and 99.54%, respectively, compared to other deep learning approaches. Lastly, we have developed a web app using Streamlit to detect DFU efficiently. This study fills the gap between reliable and interpretable systems with a proposed approach to the efficient detection of DFU.
    DOI:  https://doi.org/10.1371/journal.pone.0330669
  17. J Clin Epidemiol. 2025 Aug 29. pii: S0895-4356(25)00290-2. [Epub ahead of print] 111957
       OBJECTIVES: This study aimed to follow best practice by temporally evaluating existing GDM prediction models, updating them where needed, and comparing the temporal evaluation performance of the ML-based models with that of regression-based models.
    STUDY DESIGN AND SETTING: We utilised new data for the temporal validation dataset with 12,722 singleton pregnancies at the Monash Health Network from 2021 to 2022. The Monash GDM Logistic Regression (LR) model with six categorical variables (version 2) and the Monash GDM Machine Learning model (version 3), along with an extended LR GDM model (version 3), each with eight categorical and continuous variables, were evaluated. Model performance was assessed using discrimination and calibration. Decision curve analyses (DCA) were performed to determine the net benefit of models. Recalibration was considered to improve model performance.
    RESULTS: The development datasets for model versions 2, 3, and the new temporal validation dataset included 21.2%, 22.5%, and 33.5% of pregnant women aged ≥35 years, respectively; 22%, 23.7%, and 24.0% with a body mass index (BMI) ≥30 kg/m2; and GDM prevalence rates of 18%, 21.3%, and 28.6%, respectively. There was similar discrimination performance across the models, with areas under the curve (AUCs) of 0.72 [95% CI: 0.71, 0.73], 0.73 [95% CI: 0.72, 0.74], and 0.73 [95% CI: 0.73, 0.74] for version 2 and version 3 ML and LR models, respectively. All models exhibited overestimation with calibration slopes of 0.87, 0.99, and 0.87, respectively, which improved with recalibration. DCA showed that all models had better net benefits as compared to treat all and treat none. For all models, some variability has been observed in prediction performance across ethnic groups and parity.
    CONCLUSIONS: Despite significant changes in the background characteristics of the population, we have demonstrated that all models remained robust, especially after recalibration. However, the performance of the original ML model decreased significantly during validation. Dynamic models are better suited to adapt to the temporal changes in baseline characteristics of pregnant women and the resulting calibration drift, as they can incorporate new data without requiring manual evaluation.
    Keywords:  Clinical Epidemiology; Gestational Diabetes; Learning health system; Machine Learning; Prediction model validation
    DOI:  https://doi.org/10.1016/j.jclinepi.2025.111957