bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2025–10–05
seventeen papers selected by
Mott Given



  1. Rev Med Chil. 2025 Oct;pii: S0034-98872025001000686. [Epub ahead of print]153(10): 686-694
       AIM: To evaluate and correct the reported sensitivity and specificity values of DART (Diagnóstico Automatizado de Retinografías Telemáticas), an automated Artificial Intelligence based (AI-based) screening tool used for diabetic retinopathy (DR) detection in the Chilean public healthcare system, by employing the appropriate gold standard and conditional probability.
    METHODS: Data were obtained from the clinical validation of DART. DR detection capabilities were assessed for three methods 1) Fundoscopy, 2) Retinography and 3) AI- DART. To estimate the true sensitivity and specificity of DART, conditional probability was applied using three hypothetical sensitivities level for method 2: A) Optimistic (90%), B) Moderate (80%), and C) Pessimistic (70%). Based on these scenarios, corrected sensitivity and specificity values for DART were calculated, along with false negative/positive rates (%FN/%FP), and predictive values (NPV/PPV).
    RESULTS: In all scenarios, corrected sensitivity and specificity values for DART were significantly lower than those reported in the original validation study. Compared to method 3 (AI-based), method 2 (retinography by and ophthalmologist) consistently demonstrated superior performance across all metrics, including FN%, FP%, NPV and PPV values.
    CONCLUSION: While the integration of new AI-based technologies like DART in healthcare offer promise for enhancing patient care, their implementation must be preceded by validation using the correct gold standard. Reliable clinical decision-making depends on trustworthy diagnostic parameters.
    DOI:  https://doi.org/10.4067/s0034-98872025001000686
  2. Sci Rep. 2025 Oct 03. 15(1): 34486
      Diabetes is one of the main diseases posing a threat to healthcare systems. One of the complications of diabetes is diabetic retinopathy, which, if left untreated, can lead to serious consequences such as blindness. Early detection of this disease is critical to prevent disability and stop the process of vision loss. In our research, we aimed to develop and validate a machine learning model enabling early diagnosis of retinopathy disease. We were the first to conduct research using as many as eight public databases and one private database collected during the project implemented by the Ministry of Digital Affairs and the Ministry of Health of Poland. We analyzed 14,402 fundus photographs from patients, leveraging this large dataset to enhance the trustworthiness and validity of our findings. Such a large number of photos emphasizes the credibility and reliability of the results obtained. A significant innovation in our approach includes employing forty-six unique methods for feature selection and extraction, utilizing techniques such as CLAHE, B-CosFire, and Hough transform. We chose XgBoost and Random Forest algorithms for classification, with parameter tuning performed via the Optuna library. Our most successful model, employing the Random Forest algorithm combined with LBP and GLCM for feature extraction, reached a classification accuracy of 80.41%, F1-Score of 74.41%, and AUC of 0.80. The machine learning model we developed proved highly effective in the early detection of diabetic retinopathy. Further refinement is recommended to make this model a viable tool in clinical settings.
    Keywords:  Artificial intelligence; Diabetes; Diabetic retinopathy; Feature extraction; Feature selection; Machine learning
    DOI:  https://doi.org/10.1038/s41598-025-06973-z
  3. Sci Rep. 2025 Sep 30. 15(1): 33742
      Diabetic retinopathy is a leading cause of vision loss, necessitating early, accurate detection. Automated deep learning models show promise but struggle with the complexity of retinal images and limited labeled data. Due to domain differences, traditional transfer learning from datasets like ImageNet often fails in medical imaging. Self-supervised learning (SSL) offers a solution by enabling models to learn directly from medical data, but its success depends on the backbone architecture. Convolutional Neural Networks (CNNs) focus on local features, which can be limiting. To address this, we propose the Multi-scale Self-Supervised Learning (MsSSL) model, combining Vision Transformers (ViTs) for global context and CNNs with a Feature Pyramid Network (FPN) for multi-scale feature extraction. These features are refined through a Deep Learner module, improving spatial resolution and capturing high-level and fine-grained information. The MsSSL model significantly enhances DR grading, outperforming traditional methods, and underscores the value of domain-specific pretraining and advanced model integration in medical imaging.
    Keywords:  CBAM; Diabetic retinopathy grading; Feature pyramid network; Self-supervised learning; Vision transformer
    DOI:  https://doi.org/10.1038/s41598-025-85685-w
  4. Comput Methods Biomech Biomed Engin. 2025 Oct 03. 1-19
      Blood glucose levels are essential for metabolism and brain function; insulin regulates sugar to prevent hypo- and hyperglycemia. Proper control prevents diabetic complications from insulin deficiency or resistance. Rapid, precise diabetes identification is critical for effective care. This study proposes SCAW-Net within TabNet to boost prediction accuracy and computational speed, compared with AdaBoost, XGBoost, Bagging, and Random Forest. Trained on diabetes features and tested on multiple datasets, the model achieved 98.9% accuracy, outperforming others. Consistent results on complex, imbalanced data validate SCAW-Net in TabNet as a promising real-world diabetes prediction tool, supporting timely clinical intervention and improved patient management outcomes.
    Keywords:  Deep Learning; Diabetes Prediction; Feature Selection; Healthcare; SCAW-Net; TabNet
    DOI:  https://doi.org/10.1080/10255842.2025.2566962
  5. Front Endocrinol (Lausanne). 2025 ;16 1634358
       Objective: To identify risk factors for hypoglycemia in hospitalized patients with type 2 diabetes mellitus (T2DM) and develop predictive models for hypoglycemia severity based on machine learning algorithms.
    Methods: Adult non-pregnant hospitalized patients diagnosed with T2DM were retrospectively enrolled from the electronic medical record system of the Affiliated Hospital of Qingdao University. Patients were categorized into hypoglycemia groups (mild, moderate-to-severe) or a non-hypoglycemia group based on inpatient venous plasma glucose levels. After data preprocessing, univariate and multivariate analyses were conducted to identify significant predictors. Three predictive models (XGBoost, Random Forest [RF], and Logistic Regression) were subsequently constructed and validated to evaluate their predictive performances.
    Results: From an initial cohort of 8,947 patients, 1,798 patients were included after data screening. Among the evaluated models, the RF model demonstrated the highest predictive accuracy (93.3%) and Kappa coefficient (0.873), followed by XGBoost (accuracy: 92.6%, Kappa: 0.860). Logistic regression exhibited comparatively lower performance (accuracy: 83.8%, Kappa: 0.685). The macro-average area under the ROC curve (AUC) values for RF, XGBoost, and logistic regression were 0.960, 0.955, and 0.788, respectively, highlighting the superior discriminative capability of the RF model. While both XGBoost and RF models identified glycemic control metrics and glucose variability as core predictors for hypoglycemia, the RF model additionally emphasized medication usage, whereas XGBoost prioritized basal metabolic parameters.
    Conclusions: The RF model outperformed XGBoost and conventional logistic regression in predicting hypoglycemia severity among hospitalized T2DM patients. The results emphasize the importance of closely monitoring glucose levels and glucose variability during diabetes management to prevent hypoglycemia. The developed model provides a foundation for implementing preventive strategies to reduce hypoglycemia occurrence in hospitalized patients with T2DM.
    Keywords:  clinical analysis; hypoglycemia; machine learning; risk prediction; type 2 diabetes mellitus
    DOI:  https://doi.org/10.3389/fendo.2025.1634358
  6. PLoS One. 2025 ;20(9): e0330454
      Type 2 diabetes mellitus remains a critical global health challenge, with rising incidence rates placing immense pressure on healthcare systems worldwide. This chronic metabolic disorder affects diverse populations, including the elderly and children, leading to severe complications. Early and accurate prediction is essential to mitigate these consequences, yet traditional models often struggle with challenges such as imbalanced datasets, high-dimensional data, missing values, and outliers, resulting in limited predictive performance and interpretability. This study introduces DiabetesXpertNet, an innovative deep learning framework designed to enhance the prediction of Type 2 diabetes mellitus. Unlike existing convolutional neural network models optimized for image data, which focus on generalized attention mechanisms, DiabetesXpertNet is specifically tailored for tabular medical data. It incorporates a convolutional neural network architecture with dynamic channel attention modules to prioritize clinically significant features, such as glucose and insulin levels, and a context-aware feature enhancer to capture complex sequential relationships within structured datasets. The model employs advanced preprocessing techniques, including mean imputation for missing values, median replacement for outliers, and feature selection through mutual information and LASSO regression, to improve dataset quality and computational efficiency. Additionally, a logistic regression-based class weighting strategy addresses class imbalance, enhancing model fairness. Evaluated on the PID dataset and Frankfurt Hospital, Germany Diabetes datasets, DiabetesXpertNet achieves an accuracy of 89.98%, AUC of 91.95%, precision of 89.08%, recall of 88.11%, and F1-score of 88.01%, outperforming existing machine learning and deep learning models. Compared to traditional machine learning approaches, it demonstrates significant improvements in precision (+5.1%), recall (+4.8%), F1-score (+5.1%), accuracy (+6.0%), and AUC (+4.5%). Against other convolutional neural network models, it shows meaningful gains in precision (+2.2%), recall (+1.1%), F1-score (+1.2%), accuracy (+1.9%), and AUC (+0.6%). These results underscore the robustness and interpretability of DiabetesXpertNet, making it a promising tool for early Type 2 diabetes diagnosis in clinical settings.
    DOI:  https://doi.org/10.1371/journal.pone.0330454
  7. Cardiovasc Diabetol. 2025 Oct 03. 24(1): 383
       BACKGROUND: Patients with diabetes admitted to emergency care face a higher risk of complications, including prolonged hospital stays, admissions to the intensive care unit and mortality.
    AIM: To develop a machine learning (ML) model to predict 30-day mortality in patients with diabetes admitted to the emergency department (ED).
    DESIGN AND SETTING: A cohort study utilizing data from all nine ED's in Region Skåne 2017 to 2018. Totally 74,611 patient visits, representing 34,280 unique patients aged > 18 years with diabetes or hyperglycemia (glucose were > 11 mmol/L). The analysis focused on four groups, men and women aged 40-69 and ≥ 70 years.
    METHODS: Stochastic gradient boosting was employed to develop a model predicting 30-day mortality. Variable importance was assessed using normalized relative influence (NRI) scores. Variables in certain hospitals were used to train the models, and the models were tested in other hospitals.
    RESULTS: Key predictors included laboratory values (pH, base excess, pCO2, standard bicarbonate, oxygen saturation, lactate, CRP, and leukocytes), as well as age, triage category, and time to doctor consultation. The sensitivity of the models ranged from 86-97%, the specificity from 86-94%, and accuracy between 86% and 94%. The area under the curve (AUC) ranged from 0.84 to 0.93 and Cohen's kappa ranged from 0.34 to 0.45. Positive predictive values accurately identified mortality in 23% to 37% of cases across the four groups.
    CONCLUSIONS: A machine learning model based on routinely collected data in the ED accurately predicted 30-day mortality with high specificity and sensitivity. This approach shows promise in identifying high-risk patients requiring close monitoring and timely interventions.
    Keywords:  Artificial intelligence; Diabetes; Emergency medicine; Gradient boosting; Normalized relative influence; Prediction
    DOI:  https://doi.org/10.1186/s12933-025-02954-8
  8. World J Nephrol. 2025 Sep 25. 14(3): 109470
      Diabetes mellitus ranks among the most prevalent non-communicable diseases worldwide, affecting a vast number of individuals. It can impact almost every organ in the body, leading to serious complications such as diabetic retinopathy (DR), diabetic nephropathy, and diabetic neuropathy. Scientific literature indicates that patients with severely compromised kidney function may develop non-responsive DR. Moreover, anaemia in individuals with diabetic kidney disease (DKD) complicates DR and can contribute to significant health issues. Optical coherence tomography (OCT) is a widely used non-invasive imaging tool for diagnosing, managing, and predicting DR. OCT findings in patients with DR and DKD include cystoid macular oedema, diffuse retinal thickening, disruptions in the ellipsoid layer, hyperreflective dots, and damage to the external limiting membrane. The review examines OCT patterns of diabetic macular oedema in DKD, correlating these patterns with declines in kidney function and visual acuity. Additionally, we review various biomarkers linked to DR in DKD patients and the growing importance of novel imaging biomarkers in predicting and connecting the severity of DR with DKD.
    Keywords:  Chronic kidney disease; Diabetic nephropathy; Diabetic retinopathy; Fundus photography; Ocular biomarkers; Optical coherence tomography; Optical coherence tomography angiography
    DOI:  https://doi.org/10.5527/wjn.v14.i3.109470
  9. J Clin Epidemiol. 2025 Sep 26. pii: S0895-4356(25)00334-8. [Epub ahead of print] 112001
       OBJECTIVES: To address the limitations of existing models for research and population health applications in older adults with type 2 diabetes, we developed and validated cardiovascular disease (CVD) and heart failure risk models using linked Medicare claims and electronic health records (2013-2020).
    STUDY DESIGN AND SETTING: The study included adults >65 years with type 2 diabetes and ≥1 HbA1c measurement before cohort entry (defined as the date of a physician/outpatient visit). Using LASSO and XG-boost machine learning algorithms, we predicted 1-year risks of a composite cardiovascular event (myocardial infarction, stroke, coronary artery revascularization, or hospitalization for heart failure). Separate models were developed for patients with and without baseline CVD using claims-only and claims-EHR predictors. Models were trained on 70% of the data and validated on 30%. Model performance was evaluated using c-statistics for discrimination, scaled Brier scores, and calibration curves. We externally validated the models in Clinformatics commercial and Medicare Advantage claims data.
    RESULTS: There were 14,776 patients with baseline CVD [mean (SD) age: 77(8) years] and 10,679 without baseline CVD [mean (SD) age: 74 (7) years]. Claims-only models achieved a c-statistic of 0.75 and a Brier score of 0.09 in patients with baseline CVD, while in those without baseline CVD, the c-statistic was 0.73 and the Brier score was 0.01. For both subgroups, calibration intercepts were ∼0, with slopes ∼1. Claims-EHR models provided similar performance.
    CONCLUSION: In older adults with diabetes, our models predicted one-year cardiovascular outcomes with good discrimination and accuracy, independently of CVD history.
    PLAIN LANGUAGE SUMMARY: Older adults with type 2 diabetes have a high risk of heart disease, heart failure, and death, yet it is difficult to predict who is most at risk. Most existing prediction tools are designed for use during a single clinic visit, not for large healthcare databases that researchers use to study treatment safety and effectiveness. In this study, we developed computer-based models using Medicare claims data and, for some models, additional information from electronic health records (EHR). These models predicted the chance of having a major heart event or dying within one year. We created separate models for people with and without existing heart disease, because their risk factors differ. Our models accurately predicted risk in both groups. Adding EHR data did not improve performance compared to using claims data alone. This means that claims-only models can still be useful for researchers studying treatments in large healthcare databases. These models can help identify people at higher risk, guide research on diabetes medications, and support better planning for healthcare resources.
    Keywords:  LASSO; Medicare; Prediction Algorithms; cardiovascular diseases; electronic health records; gradient boosted trees; heart failure; machine learning; type 2 diabetes mellitus
    DOI:  https://doi.org/10.1016/j.jclinepi.2025.112001
  10. BMC Med Inform Decis Mak. 2025 Sep 29. 25(1): 344
      
    Keywords:  Diabetic peripheral neuropathy; Machine learning; Predict
    DOI:  https://doi.org/10.1186/s12911-025-03201-6
  11. Sci Rep. 2025 Sep 29. 15(1): 33347
      Biomedical imaging has developed as a non-invasive and effective approach for early disease diagnosis and health monitoring. Diabetes mellitus (DM) is a severe metabolic disease with a high global incidence, characterized by the improper secretion of insulin in the pancreas, which results in elevated blood glucose levels. Moreover, it is one of the most life-threatening illnesses, and a prompt prediction of diabetes is of the highest significance in the present scenario. The analytic models, such as fasting plasma glucose, utilized nowadays are considered to be invasive and time-consuming. So, it is highly essential to develop an easy and non-invasive model for diagnosing DM. For the last few years, several analysis techniques that depend on tongue images have been proposed. The diagnosis of DM is a major subdivision of tongue analysis. Recently, numerous deep learning techniques have been developed and shown to be highly efficient in analyzing DM based on tongue images. This paper presents a Deep Feature Engineering with Crayfish Optimization for Accurate Diabetes Disease Detection via Tongue Image Analysis (DFECO-DDTIA) technique in biomedical imaging. The primary goal of the DFECO-DDTIA technique is to develop an accurate diagnostic method for diabetes using advanced tongue imaging techniques. Initially, the DFECO-DDTIA technique utilizes an upgraded weighted median filtering (Up-WMF) method for noise removal, thereby enhancing image quality. For the feature extraction process, the squeeze-and-excitation-DenseNet (SE-DenseNet) method is employed. Furthermore, the DFECO-DDTIA approach implements the temporal convolutional network (TCN) method for classification. To further optimize the model's performance, the Crayfish Optimisation Algorithm (COA) method is employed for hyperparameter tuning, ensuring the selection of optimal parameters to enhance accuracy. To highlight the improved performance of the DFECO-DDTIA approach, a comprehensive experimental analysis is conducted under the Tongue images dataset. The comparison analysis of the DFECO-DDTIA approach revealed a superior accuracy value of 96.91% compared to existing models.
    Keywords:  Biomedical imaging; Crayfish optimization algorithm; Diabetes mellitus; Feature engineering; Temporal convolutional network; Tongue images
    DOI:  https://doi.org/10.1038/s41598-025-14780-9
  12. Diabetes Obes Metab. 2025 Sep 29.
       AIM: Patients with type 2 diabetes mellitus (T2DM) exhibit an elevated prevalence of metabolic dysfunction-associated steatotic liver disease (MASLD) and are at greater risk of liver-related adverse events. Existing non-invasive tools show limited diagnostic performance in this population. This study aims to develop a predictive model that accurately identifies the risk of MASLD among T2DM patients.
    MATERIALS AND METHODS: Clinical data were collected from T2DM patients hospitalised at Nanjing Drum Tower Hospital between January 2018 and May 2025. Eight machine learning methods were developed to predict the risk of MASLD in T2DM patients. The discriminatory ability of the models was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, recall, negative predictive value, positive predictive value, and F1 score. Calibration curves and decision analysis curves were employed to evaluate the calibration and clinical utility. The models were interpreted using the Shapley additive explanations method, and unsupervised clustering was performed to identify potential high-risk subgroups.
    RESULTS: A total of 3836 T2DM patients constituted the complete dataset, with a MASLD incidence rate of 55.9%. Thirteen feature variables were selected for model construction, and the XGB model achieved optimal overall performance, with an AUROC of 0.873 and an AUPRC of 0.904. Unsupervised clustering identified several high-risk subgroups with distinct metabolic characteristics.
    CONCLUSION: The model developed enables reliable and interpretable MASLD risk prediction in T2DM patients based on selected commonly available clinical data, providing a practical tool for routine identification and stratified management.
    Keywords:  machine learning; metabolic syndrome‐associated liver disease; risk prediction; type 2 diabetes mellitus
    DOI:  https://doi.org/10.1111/dom.70168
  13. J Particip Med. 2025 Oct 02. 17 e69497
       Background: Preventing diabetes is a priority for governments and health systems worldwide. Artificial intelligence (AI) has the potential to inform prevention and planning. However, there is little guidance on how patients, caregivers, and communities are engaged in AI life cycle stages.
    Objective: This formative qualitative study aimed to identify principles for meaningful community engagement. The goal was to support the responsible use of machine learning models in diabetes prevention and management.
    Methods: We conducted a literature scan on how AI or digital health initiatives have engaged patients and communities. A participatory workshop was then organized with patients, caregivers, community organizations, clinicians, and policymakers. In the workshop, we identified and ranked guiding principles for community engagement in AI for population health. We also outlined key considerations for implementing these principles.
    Results: We identified 10 principles for patient and community engagement in AI for health care from 6 papers and developed a conceptual framework for community engagement on AI. A total of 30 workshop participants discussed and ranked the top 6 principles: trust, equity, accountability, transparency, codesign, and value alignment. Participants noted that embedding community engagement in the AI life cycle requires inclusivity and diversity. Additionally, implementers should leverage existing resources and adopt a centralized approach to AI decision-making.
    Conclusions: Our study offers useful insights for community-focused AI deployment that centers the values of patients and communities. The identified principles can guide meaningful engagement on the use of AI in health systems, while future research can operationalize the conceptual framework.
    Keywords:  Canada; artificial intelligence; community engagement; diabetes; patient engagement
    DOI:  https://doi.org/10.2196/69497
  14. J Obstet Gynaecol Res. 2025 Oct;51(10): e70087
       AIM: This study aimed to develop predictive models and establish a risk scoring system to identify risk factors associated with survival in uterine cancer patients with type 2 diabetes (T2D) and estimate their survival probabilities.
    METHODS: Data were collected from the Hong Kong Hospital Authority Data Collaboration Laboratory (HADCL) from 2000 to 2020. Cox proportional hazards regression, survival tree, LASSO Cox regression, boosting, and random survival forest (RSF) were utilized to develop predictive models for survival. Key risk factors were identified through Shapley Additive Explanations analysis, whereas the AutoScore-Survival package facilitated the development of a risk scoring system.
    RESULTS: This cohort study included 2047 uterine cancer patients with T2D. The average survival time was 100.82 (standard deviation: 72.75) months. The RSF model demonstrated the strongest predictive performance, achieving a time-dependent area under the curve (AUC) of 0.823 and a C-index of 0.90. A risk scoring system was created based on several criteria: age at cancer diagnosis, duration of T2D, creatinine levels, serum potassium level, low-density lipoprotein cholesterol level (LDL-C) level, body mass index (BMI), and triglycerides level. This scoring system classified 31.4% of patients as high-risk, resulting in a 5-year survival probability of 43.5%, about 1.7 times lower than that of the low-risk group.
    CONCLUSION: This study leveraged machine learning to identify key survival predictors and develop a clinically interpretable risk scoring system for uterine cancer patients with T2D. Key predictors, including age at cancer diagnosis, duration of T2D, creatinine levels, serum potassium levels, LDL-C levels, BMI, and triglycerides levels, effectively stratified survival risk. These findings demonstrate the potential of data-driven models to enhance individualized prediction and inform targeted clinical management.
    Keywords:  diabetes mellitus; machine learning; random survival analysis; risk score; uterine cancer
    DOI:  https://doi.org/10.1111/jog.70087
  15. Eur J Prev Cardiol. 2025 Sep 30. pii: zwaf625. [Epub ahead of print]
      
    Keywords:  Cardiovascular Disease; Machine Learning; SGLT-2 Inhibitor; Target Trial Emulation
    DOI:  https://doi.org/10.1093/eurjpc/zwaf625
  16. Ophthalmol Sci. 2026 Jan-Feb;6(1):6(1): 100911
       Objective: To evaluate the diagnostic accuracy of 4 multimodal large language models (MLLMs) in detecting and grading diabetic retinopathy (DR) using their new image analysis features.
    Design: A single-center retrospective study.
    Subjects: Patients diagnosed with prediabetes and diabetes.
    Methods: Ultra-widefield fundus images from patients seen at the University of California, San Diego, were graded for DR severity by 3 retina specialists using the ETDRS classification system to establish ground truth. Four MLLMs (ChatGPT-4o, Claude 3.5 Sonnet, Google Gemini 1.5 Pro, and Perplexity Llama 3.1 Sonar/Default) were tested using 4 distinct prompts. These assessed multiple-choice disease diagnosis, binary disease classification, and disease severity. Multimodal large language models were assessed for accuracy, sensitivity, and specificity in identifying the presence or absence of DR and relative disease severity.
    Main Outcome Measures: Accuracy, sensitivity, and specificity of diagnosis.
    Results: A total of 309 eyes from 188 patients were included in the study. The average patient age was 58.7 (56.7-60.7) years, with 55.3% being female. After specialist grading, 70.2% of eyes had DR of varying severity, and 29.8% had no DR. For disease identification with multiple choices provided, Claude and ChatGPT scored significantly higher (P < 0.0006, per Bonferroni correction) than other MLLMs for accuracy (0.608-0.566) and sensitivity (0.618-0.641). In binary DR versus no DR classification, accuracy was the highest for ChatGPT (0.644) and Perplexity (0.602). Sensitivity varied (ChatGPT [0.539], Perplexity [0.488], Claude [0.179], and Gemini [0.042]), whereas specificity for all models was relatively high (range: 0.870-0.989). For the DR severity prompt with the best overall results (Prompt 3.1), no significant differences between models were found in accuracy (Perplexity [0.411], ChatGPT [0.395], Gemini [0.392], and Claude [0.314]). All models demonstrated low sensitivity (Perplexity [0.247], ChatGPT [0.229], Gemini [0.224], and Claude [0.184]). Specificity ranged from 0.840 to 0.866.
    Conclusions: Multimodal large language models are powerful tools that may eventually assist retinal image analysis. Currently, however, there is variability in the accuracy of image analysis, and diagnostic performance falls short of clinical standards for safe implementation in DR diagnosis and grading. Further training and optimization of common errors may enhance their clinical utility.
    Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
    Keywords:  Artificial intelligence; Diabetic retinopathy; Image analysis; Multimodal large language model; Ultra-widefield fundus photography
    DOI:  https://doi.org/10.1016/j.xops.2025.100911
  17. Diabetes Technol Ther. 2025 Sep 29.
      Objective: We aimed to develop and validate natural language processing (NLP) algorithms to identify insulin pump and continuous glucose monitor (CGM) users using unstructured clinical note data from the electronic health record (EHR). Methods: We reviewed a random sample of outpatient clinical notes from endocrinologists to catalog how insulin pump and CGM use was documented. We translated these patterns into regular expressions and used them to build rule-based NLP algorithms, which we iteratively refined. We evaluated the final algorithms in a University of California Los Angeles (UCLA) holdout dataset that included the most recent note from 667 unique patients. We then externally validated the algorithms in a second health system with a different EHR and patient population. Manual chart review served as the gold standard. We assessed performance with measures including sensitivity and specificity. To contextualize algorithm performance, we evaluated the accuracy of billing codes for insulin pump and CGM use within the same UCLA holdout dataset. Results: In the UCLA holdout dataset, our insulin pump algorithm achieved a sensitivity of 0.90 and specificity of 0.89. The CGM algorithm achieved a sensitivity of 0.85 and specificity of 0.84. The combined algorithm identifying both insulin pump and CGM use showed a sensitivity of 0.76 and specificity of 0.92. In comparison, billing codes underperformed: International Classification of Diseases/Current Procedural Terminology (CPT) codes identified insulin pump use with a sensitivity of 0.09 and specificity of 1.00, whereas CPT codes identified CGM use with a sensitivity of 0.68 and specificity of 0.86. For combined device use, billing codes had a sensitivity of 0.06 and specificity of 1.00. External validation demonstrated similarly strong algorithm performance in the second health system. Conclusions: We showed that NLP can accurately identify insulin pump and CGM users from unstructured EHR notes, substantially outperforming billing code-based methods. This scalable approach can support system- and population-level evaluations of diabetes technologies.
    Keywords:  CGM; NLP; continuous glucose monitoring; electronic medical record; insulin pump; natural language processing
    DOI:  https://doi.org/10.1177/15209156251383828