bims-aukdir Biomed News
on Automated knowledge discovery in diabetes research
Issue of 2026–07–05
28 papers selected by
Mott Given



  1. J Diabetes Metab Disord. 2026 Dec;25(2): 175
      Diabetic retinopathy (DR) is a leading cause of vision loss in individuals with diabetes, making early and accurate detection essential for preventing severe complications. Automated classification of DR stages from retinal fundus images can assist clinicians in timely diagnosis and management. This study proposes a hybrid approach that combines deep learning-based feature extraction with traditional machine learning classifiers for automatic DR stage classification. Two datasets were used: Dataset A (DiabeticRetinopathy_Messidor_EyePACS_Preprocessed) and Dataset B (APTOS 2019 Blindness Detection). Three hybrid architectures were evaluated: MobileNetV2 with Support Vector Machine (SVM), MobileNetV2 with Random Forest (RF), and VGG16 with SVM. The models exploit convolutional neural networks for extracting discriminative features and employ conventional classifiers for robust decision-making. Grad-CAM and Score-CAM techniques were applied to enhance interpretability by visualizing the regions influencing model predictions. Experimental results on Dataset B show that MobileNetV2 + SVM achieved an accuracy of 85%, precision of 72%, recall of 75%, and F1 score of 73%, while MobileNetV2 + RF achieved an accuracy of 85%, precision of 74%, recall of 72%, and F1 score of 73%. These results indicate that lightweight CNNs combined with traditional classifiers can produce reliable and interpretable DR stage predictions. The study highlights the potential of hybrid models for clinical deployment, offering accurate, transparent, and efficient tools to support ophthalmologists in DR screening and management. Future work will focus on addressing data imbalance, improving model generalizability, and integrating these methods into real-world clinical diagnostic systems.
    Supplementary Information: The online version contains supplementary material available at 10.1007/s40200-026-01938-z.
    Keywords:   Deep learning; Diabetic retinopathy; Fundus images; Grad-CAM; Hybrid models; MobileNetV2; Random Forest; SVM; Score-CAM; VGG16
    DOI:  https://doi.org/10.1007/s40200-026-01938-z
  2. Sci Rep. 2026 Jul 01.
      Despite the advent of automated diabetic retinopathy (DR) severity grading from retinal fundus images, it remains challenging because of class imbalance, subtle DR lesion characteristics, and limited generalization of currently available deep learning models. Given the above, this study aims to fill these gaps by introducing a novel hybrid CNN-Transformer architecture (EffTNet), which extracts local lesion features by using the EfficientNet-B3a network and learns the global contextual features by using the ViT-B16 network. A novel Attention-Augmented Feature Fusion (AAFF) module is presented to adaptively combine complementary features from both branches by channel-wise attention and feature recalibration. Furthermore, pair-aware contrastive learning is used to boost inter-class separability and improve the recognition of the underrepresented severe DR categories. The harmonised dataset used for training the model included APTOS 2019, EyePACS and Messidor-2 images, while the other three datasets (DDR, DiaRetDB1 and IDRiD) were external and used for validation of the model. The experimental results show that the EffTNet obtained an accuracy of 98.77%, a precision of 96.30%, a recall of 97.83%, an F1-score of 96.99%, and a Quadratic Weighted Kappa score of 0.947. A grad-CAM analysis also shows that the model is trained on clinically relevant retinal lesions. The outcome shows that EffTNet is effective in DR severity grading with high accuracy and good interpretability that shows good generalization performance across the two datasets and the possibility for deploying to a large-scale retinal screening system.
    Keywords:  Automated screening; Deep learning; Diabetic retinopathy; Fundus imaging; Hybrid architecture; Vision transformer
    DOI:  https://doi.org/10.1038/s41598-026-59475-x
  3. Transl Vis Sci Technol. 2026 Jul 01. 15(7): 3
       Purpose: The purpose of this study was to evaluate whether explicitly modeling diabetes mellitus (DM) without diabetic retinopathy (DR) as its own stage enables deep learning (DL) to detect early retinal changes for early risk identification of DR severity spectrum.
    Methods: We developed 3 DL classification models that explicitly incorporated DM without DR as a distinct stage using 3-class, 4-class, and 6-class staging granularity using 6069 color fundus images from the University of Illinois Chicago Hospital, including 1996 no-DM cases, 1852 DM without DR cases, and 2221 DR cases (516 mild, 220 moderate, 103 severe, and 1382 proliferative DR [PDR]). We developed segmentation models for the optic nerve head (ONH) and retinal vessels to quantify the impact of these regions on classification performance through targeted perturbations. We also examined spatial changes in retinal features across DR stages by measuring the alignment between DL saliency maps and ONH location.
    Results: For the 3-class model, areas under the curve (AUCs) were 92.2% (no-DM), 80.3% (DM without DR), and 74.1% (mild DR). For the 4-class model, AUCs were 94.0% (no-DM), 71.9% (DM without DR), 61.5% (mild DR), and 80.3% (referable DR). For the 6-class model, AUCs were 94.0% (no-DM), 65.7% (DM without DR), 65.6% (mild DR), 58.9% (moderate DR), 58.6% (severe DR), and 76.1% (PDR). Vessel perturbations reduced performance by 16% to 31% across models, and greater DR severity was associated with increased saliency-to-ONH distances (Pearson r = 0.69-0.72, P < 0.001).
    Conclusions: Explicitly modeling diabetes without retinopathy improved early-stage discrimination and revealed feature-reliance shifts with DR severity. Vessel- and saliency-based analyses identified subtle retinal changes preceding clinical DR.
    Translational Relevance: Treating diabetes without retinopathy as its own stage may enhance early DR risk identification and aid development of clinically useful artificial intelligence (AI) tools.
    DOI:  https://doi.org/10.1167/tvst.15.7.3
  4. Int J Gen Med. 2026 ;19 601767
       Purpose: Establishing a machine learning model to predict diabetic peripheral neuropathy (DPN) in patients with type 2 diabetes mellitus (T2DM) and exploring the role of bilateral brachial-ankle pulse wave velocity and anthropometric indices.
    Patients and Methods: Clinical data of 966 T2DM patients were retrospectively analyzed. According to sensory nerve conduction test results, they were divided into a DPN group and a non-DPN group. The BorutaShap method was employed to screen influencing factors, based on which nine machine learning models were established and compared. Interpretative analysis was performed using the SHAP (SHapley Additive exPlanations) package in Python. The mean absolute SHAP value of feature parameters was defined as their importance and ranked accordingly. The relationship between each feature and DPN was determined based on SHAP values, and quantitative analysis was conducted for continuous variables.
    Results: Among 966 T2DM patients, 469 were diagnosed with DPN and 13 influencing factors identified. Of nine machine learning models, the Support Vector Machine (SVM) model performed best (accuracy 0.74[95% CI: 0.69-0.79], AUC 0.82[95% CI: 0.77-0.87], recall 0.66[95% CI: 0.58-0.74], precision 0.80[95% CI: 0.73-0.86], F1 0.72[95% CI: 0.66-0.78]). SHAP analysis of the SVM model showed left brachial-ankle pulse wave velocity (LBAPWV) as the most influential predictor (SHAP=0.70), followed by gender, Glucose 0min, fT3, diabetes duration, and hip circumference. Right brachial-ankle pulse wave velocity (RBAPWV) contributed less (SHAP=0.20). Risk factors included LBAPWV, Gender, Glucose 0min, Diabetes duration, Insulin therapy, RBAPWV, UACR, Smoking, Height, and In-hospital blood glucose value; protective factors were fT3, Hip circumference, and C-peptide 180min.
    Conclusion: Machine learning enables robust DPN prediction. Our model revealed asymmetric importance between LBAPWV and RBAPWV, with LBAPWV showing stronger DPN associations. Hip circumference was a protective anthropometric predictor. These findings enhance DPN risk stratification.
    Keywords:  bilateral brachial-ankle pulse wave velocity; hip circumference; machine learning; predictive model; type 2 diabetes mellitus; type 2 diabetic peripheral neuropathy
    DOI:  https://doi.org/10.2147/IJGM.S601767
  5. Front Endocrinol (Lausanne). 2026 ;17 1843412
       Objective: This study aimed to develop and externally validate an interpretable machine learning (ML) model for diabetic retinopathy (DR) risk stratification using routine clinical biomarkers, and to explore potential probabilistic dependencies and interactive pathways between clinical biomarkers and DR pathogenesis through Bayesian network modeling.
    Methods: We integrated clinical data from the National Health and Nutrition Examination Survey (NHANES) with an independent hospital cohort (Nantong First People's Hospital). A multi-stage feature selection pipeline (Boruta algorithm and LASSO regression) was utilized to identify core predictors. Eight ML algorithms were benchmarked. To transcend conventional "black-box" predictions, we coupled SHAP (SHapley Additive exPlanations) for personalized interpretability with a Bayesian Network Directed Acyclic Graph (DAG) to map the probabilistic dependency structure among the selected systemic biomarkers.
    Results: The LightGBM algorithm outperformed other classifiers, yielding a robust external validation AUC of 0.841 (95% CI: 0.809-0.862). Fourteen key routine predictors were identified, spanning glycemic control, renal function, and lipid metabolism. Crucially, probabilistic dependency structure via the Bayesian Network revealed a hierarchical pathogenetic topology: rather than parallel associations, latent renal impairment markers (urine protein, BUN, and urine creatinine) and chronic glycemic toxicity (HbA1c) emerged as direct upstream dependency drivers of DR. This structural evidence suggests a probabilistic dependency consistent with the 'kidney-eye crosstalk' hypothesis.
    Conclusion: We successfully deployed a high-performing, non-invasive LightGBM model for early DR screening. By integrating predictive ML with probabilistic dependency structure, this framework not only delivers an accessible, web-based clinical decision support system (CDSS) for resource-constrained settings but also provides preliminary insights into the potential systemic microvascular interplay driving diabetic retinopathy.
    Keywords:  clinical decision support; diabetic retinopathy; interpretable machine learning; risk prediction; routine laboratory biomarkers
    DOI:  https://doi.org/10.3389/fendo.2026.1843412
  6. Int J Retina Vitreous. 2026 Jun 30.
       PURPOSE: To develop and benchmark a unified Deep Learning (DL) pipeline for automated detection and five-level grading of Diabetic Retinopathy (DR), and to derive a high-performance binary screening endpoint (DR vs No DR) suitable for scalable use in resource-limited settings.
    METHODS: A publicly available five-class DR fundus dataset of 3,500 color photographs graded using the International Clinical Diabetic Retinopathy (ICDR) scale was used. A standardized workflow was applied across eight Convolutional Neural Network (CNN) architectures (AlexNet, Densely Connected Convolutional Network 121 (DenseNet121), Residual Network 50 (ResNet50), eXtreme Inception (Xception), Mobile Network Version 2 (MobileNetV2), Efficient Network Version 2 B2 (EfficientNetV2B2), Inception Version 3 (InceptionV3), Visual Geometry Group 16 (VGG16)), including a 70/20/10 train/validation/test split, optional histogram-based contrast enhancement, strong on-the-fly augmentations, and class-balanced sampling. Seven architectures (DenseNet121, ResNet50, Xception, MobileNetV2, EfficientNetV2B2, InceptionV3, and VGG16) were initialized using ImageNet-pretrained weights and trained using a two-stage transfer-learning strategy with backbone freezing followed by partial fine-tuning, while AlexNet was implemented as a custom architecture and trained from randomly initialized weights.
    RESULTS: For five-class grading, VGG16 achieved the highest accuracy (0.7686) and weighted Jaccard index (0.6572), while EfficientNetV2B2 provided the best balanced accuracy (0.6128) and macro Area Under the Receiver Operating Characteristic Curve (AUROC 0.9158). Misclassifications were concentrated between adjacent severity levels. For binary DR screening, modern backbones achieved ≥0.94 accuracy and balanced accuracy; selected models reached AUROC ≈0.982-0.990 and Area Under the Precision-Recall Curve (AUPRC) ≈ 0.987-0.992.
    CONCLUSION: The proposed DL pipeline delivers robust multiclass DR grading and highly discriminative binary screening using widely available CNN backbones. Operating-point calibration enables sensitivity-oriented triage or specificity-oriented confirmation, supporting teleophthalmology and task-shifted DR screening programs to expand coverage and reduce preventable vision loss.
    CLINICAL TRIAL NUMBER: Not applicable.
    Keywords:  Artificial intelligence in ophthalmology; Binary classification; Deep learning; Diabetic retinopathy; Medical image analysis; Multi-class classification
    DOI:  https://doi.org/10.1186/s40942-026-00868-5
  7. Clin Ther. 2026 Jun 29. pii: S0149-2918(26)00217-1. [Epub ahead of print]
       PURPOSE: To evaluate the diagnostic performance of a regulatory-approved (CE-marked) artificial intelligence system (RetCAD) applied to nonmydriatic color fundus photographs for diabetic retinopathy (DR) screening in routine clinical care.
    METHODS: This was a prospective single-center observational diagnostic accuracy study including adults with diabetes who underwent nonmydriatic fundus imaging between January 9, 2023 and August 6, 2024. Nonmydriatic true-color confocal fundus photographs were obtained using the iCare DRSplus camera. RetCAD generated a 5-category DR grade and a continuous severity score according to the International Clinical Diabetic Retinopathy scale. Two board-certified ophthalmologists independently graded images; one served as the reference standard, the second reader's grades were used to assess inter-reader agreement using Cohen's kappa. Diagnostic accuracy was evaluated at 3 prespecified thresholds: any DR; moderate DR or worse (referable DR, defined as International Clinical Diabetic Retinopathy grade 2 or above); and severe DR or worse. Receiver operating characteristic curves and area under the curve were derived from the artificial intelligence severity score. Subgroup analyses included age, diabetes duration, estimated glomerular filtration rate, and glycated hemoglobin.
    FINDINGS: 609 participants (1218 eyes) were screened; 533 participants (1040 eyes) were included (median age 56 years [interquartile range 42-67], 51% female). For detection of referable DR at the eye level, sensitivity was 0.85 (95% confidence interval [CI] 0.79-0.91) and specificity was 0.97 (95% CI 0.96-0.98). The referral rate was 17.0%, with a reference prevalence of 17.2%. The areas under the curves were 0.85 for any DR, 0.96 for referable DR (moderate DR or worse), and 0.98 for severe DR or worse. At the patient level, sensitivity and specificity for referable DR were 0.89 and 0.95, with a referral rate of 22.0%. Sensitivity was significantly lower in participants aged ≥65 years and in those with estimated glomerular filtration rate <60 mL/min/1.73 m². Inter-reader agreement was high (κ = 0.877 unweighted; κ = 0.954 squared-weighted).
    IMPLICATIONS: RetCAD demonstrated high accuracy and strong discriminative ability for identifying referable DR using nonmydriatic fundus imaging, supporting its use as a triage tool for ophthalmic referral in routine clinical practice.
    Keywords:  Artificial intelligence; Automated screening; Diabetic retinopathy; Diagnostic accuracy; Nonmydriatic fundus photography; Retinal imaging
    DOI:  https://doi.org/10.1016/j.clinthera.2026.06.005
  8. Health Inf Sci Syst. 2026 Dec;14(1): 71
       Background: The key to optimizing insulin administration and simplifying the management of Type 1 diabetes (T1D) lies in accurately predicting future blood glucose (BG) levels. Consistently predicting BG levels is a challenging goal due to interindividual biological variability, data quality issues, and the inherent variability of glucose metabolism.
    Objective: The study aims to predict BG levels across different time horizons by analyzing multimodal data from the BrisT1D and OhioT1DM datasets, comprising CGM measurements, insulin pump data, smartwatch activity data, and dietary carbohydrate data. The purpose of this research was to develop a robust time series model that could handle noise and heterogeneous medical data and that could contribute to clinical decision making for patients with T1D.
    Methods: A variety of time series transformer models were applied, and the best model was AutoBiGluNet, which is a hybrid deep learning model that uses Autoformer and BiLSTM networks to capture global patterns and temporal dependencies. Data were preprocessed by replacing missing values for time series features through linear interpolation and using zero imputation for other numeric values.
    Results: AutoBiGluNet produced the best performance on BrisT1D, achieving an RMSE of 0.0674 ± 0.0006, MAE of 0.0411 ± 0.0004, and R2 of 0.9523 ± 0.0003 across five independent runs. On the external OhioT1DM dataset, the model also showed good generalizability, achieving RMSE of 0.88, MAE of 0.52, and R2 of 0.93 at the 30-minute prediction horizon.
    Conclusion: The model demonstrated strong predictive performance and favorable clinical error-grid results, suggesting potential for future decision-support applications. However, prospective clinical validation is required before considering integration into closed-loop insulin delivery systems.
    Keywords:  Artificial intelligence; Blood glucose prediction; Healthcare; Time series transformer; Type 1 diabetes
    DOI:  https://doi.org/10.1007/s13755-026-00469-4
  9. F1000Res. 2026 ;15 690
       Background: Chronic kidney disease (CKD) is a serious complication of type 2 diabetes (T2DM), particularly in low- and middle-income countries with limited access to early diagnosis. Predicting CKD risk using routine clinical data could enable earlier nephroprotective care. This study developed and internally validated a machine learning-based web application to predict incident CKD among T2DM patients in Indonesia's national health insurance program (Prolanis).
    Methods: A machine learning prediction model was conducted using BPJS Prolanis data (2017-2023). Adults (≥18 years) with T2DM and no prior CKD were included. Six algorithms (Logistic Regression, Random Forest, Decision Tree, XGBoost, LightGBM, CatBoost) were trained on 80% of the data and internally validated on the remaining 20% to predict CKD. Performance was assessed via accuracy, precision, recall, F1 score, and AUC. SHAP was used for interpretability.
    Results: Among 7,581 individuals, 864 (11.4%) developed CKD. CatBoost achieved the best performance (AUC = 0.847, accuracy = 0.797, precision = 0.643, recall = 0.525, F1 = 0.578). SHAP identified rapid-acting insulin analogues, amlodipine, furosemide, high blood urea nitrogen, and folic acid as key positive predictors. Advanced age and higher comorbidity burden increased risk, while chronic ischaemic heart disease and dental pulp diseases appeared protective-likely due to healthcare utilization bias. A web-based risk calculator was developed.
    Conclusions: The CatBoost-based web app demonstrated strong discriminative ability for predicting incident CKD in T2DM patients using routine claims data. This tool may support risk stratification in primary care settings across Indonesia and similar low-resource environments.
    Keywords:  chronic kidney disease; machine learning; prediction model; type 2 diabetes mellitus; web-based calculator.
    DOI:  https://doi.org/10.12688/f1000research.179913.2
  10. Br J Ophthalmol. 2026 Jun 29. pii: bjo-2025-328991. [Epub ahead of print]
       AIM: To evaluate a real-world clinical integration of an autonomous artificial intelligence (AI) system (AEYE Diagnostic Screening (AEYE-DS), AEYE Health, USA) for diabetic retinopathy (DR) screening using the Topcon NW500 camera (Topcon, Japan) in an endocrinology clinic.
    METHODS: Adults with type 1 or type 2 diabetes without previously reported DR attending routine endocrinology follow-up were invited to participate. Non-mydriatic, macula-centred fundus photographs were acquired by a novice, non-ophthalmic operator. Images were analysed by AEYE-DS to detect more-than-mild DR (mtmDR). AI-positive results prompted physician counselling and automated referral for internal confirmatory examination.
    RESULTS: Definitive AEYE-DS results were obtained for 95.7% (245/256) of participants without pharmacological dilation. Seventy-six (29.6%) patients screened positive for mtmDR, of whom 34 (44.7%) completed confirmatory examination at the institution's retina clinic; externally completed follow-up was not captured. Four patients (11.8% of those evaluated) required treatment with intravitreal anti-vascular endothelial growth factor therapy or panretinal photocoagulation. Additional previously unrecognised ocular conditions were identified among several AI-positive patients. Patient satisfaction was high, with >80% reporting the screening was easy to use, time-efficient and recommendable.
    CONCLUSIONS: In a real-world endocrinology clinic, autonomous AI screening for DR using AEYE-DS integrated with the Topcon NW500 enabled efficient DR screening and achieved high non-mydriatic imageability. A clinically relevant proportion of patients requiring ophthalmic evaluation has been captured. Internal referral for confirmatory testing enabled assessment of downstream outcomes. The findings support scalable, point-of-care autonomous screening and a stepped-referral approach for AI-positive patients.
    Keywords:  Artificial Intelligence; Diagnostic tests/Investigation; Imaging; Retina; Telemedicine
    DOI:  https://doi.org/10.1136/bjo-2025-328991
  11. Sci Rep. 2026 Jun 28.
      Early detection of diabetic retinopathy (DR) is crucial for preventing irreversible vision loss; however, existing automated methods often rely on single-modality inputs, thereby limiting the complementary diagnostic value of multimodal imaging. This study presents a cross-attention-based multimodal fusion framework that investigates cross-attention-based representation learning using semantically aligned fundus photography and OCT datasets for the classification of early-stage diabetic retinopathy (E-DR). The proposed architecture utilises EfficientNetB0 and DenseNet121 as modality-specific encoders, followed by a mid-level cross-attention module that dynamically aligns and fuses vascular and structural features. To enable consistent multimodal training, we develop a harmonised label-mapping strategy and a balanced dataset construction pipeline. Extensive experiments across unimodal and fused configurations demonstrate that our fusion model significantly outperforms baseline models, achieving an overall accuracy of 96% and notable improvements in early-stage detection (e.g., F1-score of 83% for Stage 1), comparing to the previous fusion state-of-the-art works that rely on simple concatenation or late-stage integration, our proposed fusion model captures richer cross-modal feature interactions, resulting in improved classification performance on the evaluated datasets. This study provides an exploratory investigation of cross-attention-guided multimodal fusion for retinal image analysis for early DR diagnosis and other ophthalmic screening assessments.
    Keywords:  Cross-attention mechanism; Deep learning in medical diagnosis; Diabetic retinopathy; Fundus and OCT imaging; Multimodal fusion
    DOI:  https://doi.org/10.1038/s41598-026-59735-w
  12. BMC Pregnancy Childbirth. 2026 Jul 01.
       BACKGROUND: Gestational diabetes mellitus (GDM) is a common metabolic disorder during pregnancy, leading to adverse maternal and neonatal outcomes. Exosomal microRNAs (exo-miRNAs) have emerged as promising noninvasive biomarkers due to their stability and regulatory roles in glucose metabolism. However, robust diagnostic models integrating exo-miRNAs profiles for early prediction of GDM remain lacking.
    METHODS: In this study, we used the GSE192813 dataset as a discovery cohort to identify differentially expressed exo-miRNAs (DE-exo-miRNAs) in exosomes between GDM and normal glucose tolerance (NGT) pregnancies. After differential expression analysis, five machine learning (ML) feature selection algorithms (LASSO, Random Forest, SVM-RFE, XGBoost, and Boruta) were applied to identify robust predictive DE-exo-miRNAs features. Subsequently, ten classification algorithms (including Logistic Regression, Random Forest, SVM, XGBoost, LightGBM, CatBoost, KNN, Naïve Bayes, Neural Network, and Decision Tree) were combined with the five feature-selection methods, generating 50 distinct ML models. Model performance was evaluated through repeated 7:3 train-test splits, and the best-performing classifier was externally validated using GSE114860.
    RESULTS: A total of 12 DEmiRNAs were identified in GSE192813, of which a subset of key exo-miRNAs (including miR-423-5p, miR-99a-5p, miR-148a-3p, miR-192-5p, and miR-122-5p) were consistently selected across multiple algorithms. Among the 50 ML combinations, the XGBoost + Boruta model achieved the highest diagnostic accuracy, with an AUC exceeding 0.90 and an overall accuracy greater than 90% in the discovery dataset. External validation in GSE114860 demonstrated stable performance, achieving an accuracy above 80% and good calibration. Functional enrichment analysis of target genes indicated significant involvement in insulin signaling, lipid metabolism, and inflammatory pathways.
    CONCLUSION: This integrative machine learning framework successfully identified a robust exo-miRNAs-based predictive signature for GDM. The model exhibited high diagnostic accuracy and generalizability across independent cohorts, highlighting its potential for early, noninvasive screening and precision management of gestational diabetes mellitus.
    Keywords:  Biomarkers; Early prediction; Exo-miRNAs; Gestational diabetes mellitus (GDM); Machine learning
    DOI:  https://doi.org/10.1186/s12884-026-09549-5
  13. Stud Health Technol Inform. 2026 Jun 29. 338 477-481
      Early detection of diabetic neuropathy (DN) remains challenging due to its asymptomatic progression. Machine learning models hold promise for identifying patients at risk, yet most existing studies are limited by small sample sizes that hinder generalizability. In this study, we evaluated minimum sample size requirements for machine learning using a population-based dataset of 77,724 individuals with diabetes. We generated balanced subsets with varying sample sizes (n=100-25,000) and numbers of features (3-46). We trained random forest models on each configuration and evaluated performance using the receiver operating characteristic area under the curve (ROC AUC) and precision-recall area under the curve (PR AUC). Models trained on ≤500 samples showed substantial overfitting and poor generalization. At n=1,000, performance was comparable to the reference model when the feature set was restricted (3 features), but overfitting was observed as the number of features increased (≥20). Performance stabilized for sample sizes ≥3,000, without evidence of overfitting. Our findings indicate that approximately 3,000 samples are required for reliable DN prediction with random forests, and that constraining feature dimensionality is critical when working with smaller cohorts. These results provide practical guidance on data sufficiency and model design for clinical machine learning studies, particularly in data-limited settings.
    Keywords:  Diabetic Neuropathy; Machine Learning; Sample Size
    DOI:  https://doi.org/10.3233/SHTI260889
  14. Cureus. 2026 May;18(5): e109761
      Diabetes mellitus (DM) is a serious global health problem due to the large number of people who suffer from it, the complications arising from it, and the significant economic impact associated with it. Early identification of people at risk is essential for implementing preventive strategies that reduce the burden on healthcare systems. In this context, machine learning (ML) techniques have emerged as promising tools to support the timely detection of diabetes and clinical decision-making. The objective of the research was to develop and evaluate the performance of ML models to predict diabetes risk using clinical and sociodemographic data obtained at a primary care clinic of the Instituto Mexicano del Seguro Social (IMSS) in Saltillo, Coahuila, Mexico, the main healthcare system in Mexico. Data from 1,903 patients were analyzed, taking into account demographic and clinical variables and health habits. The database is not balanced; there are more healthy people than sick people. Supervised algorithms such as support vector machine (SVM), logistic regression (LR), random forest (RF), multilayer perceptron (MLP), and k-nearest neighbors (KNN) were implemented. In addition, to improve performance, ensemble techniques and class balancing methods, such as the Synthetic Minority Oversampling Technique (SMOTE), were applied. Performance was evaluated using metrics such as sensitivity and specificity, among others. The results show that the models predict people without diabetes risk (class 0) with a high percentage, with sensitivities greater than 90% in several algorithms. In contrast, for detecting people at risk (class 1), the percentages are low, with sensitivities between 25% and 40% in the base models. The application of SMOTE and ensemble techniques, such as Extreme Gradient Boosting (XGBoost), increased sensitivity to 79%, although with an associated reduction in specificity (51.9%). These results show that, although ML models have remarkable potential to support the identification of diabetes risk, class imbalance remains a critical challenge. Improving the sensitivity of these tools is essential to promote confirmatory studies, facilitate early treatment initiation, and reduce the onset of chronic complications. In addition, interpretable and clinically verifiable models can strengthen doctor-patient communication, encourage self-care, and promote the early adoption of preventive interventions aligned with international public health recommendations.
    Keywords:  class imbalance; clinical decision support; diabetes mellitus; early detection; machine learning; risk prediction
    DOI:  https://doi.org/10.7759/cureus.109761
  15. Front Nutr. 2026 ;13 1821103
      This review synthesizes AI applications in diabetic foot ulcer (DFU) management, with a particular focus on nutritional and metabolic data integration. Emerging AI methodologies-including image-based dietary assessment, natural language processing-driven chatbots, and continuous glucose monitoring-integrated predictive models-have shown promise in adjacent fields such as general type 2 diabetes management and hemodialysis. However, none have been directly validated in DFU populations, and their applicability to DFU care remains a future research direction rather than a current reality. The main obstacles include the paucity of standardized nutritional data in existing DFU cohorts, methodological barriers in multi-modal data fusion, and the need for robust validation across diverse populations. A future research agenda is proposed, emphasizing the convergence of AI, nutritional science, and multidisciplinary care pathways. By addressing these foundational gaps, AI-enabled approaches may eventually contribute to reducing the global burden of diabetes-related amputations, but substantial methodological and validation work is required before clinical translation can be realistically anticipated.
    Keywords:  artificial intelligence; diabetic ulcers; dietary intervention; machine learning; nutritional assessment; personalized nutrition
    DOI:  https://doi.org/10.3389/fnut.2026.1821103
  16. Sci Rep. 2026 Jun 28.
      Accurate and explainable classification of diabetic foot ulcers (DFUs) is vital for early intervention and patient management. This paper presents a Clinically Audited Hybrid Self-Supervised Framework integrating domain-adaptive SimCLR pretraining, a U-Net-based refiner, and an EfficientNet-B0 classifier for four-class DFU diagnosis (Normal, Abnormal, Ischemic, Infected) by fusing two DFU datasets. The proposed method leverages self-supervised representation learning to capture ulcer-specific texture and perfusion cues, while the U-Net refiner enhances lesion boundary visibility for improved downstream feature extraction. Experiments were conducted on an aggregated DFU dataset of 24,000 images, partitioned into 19,200 for training, 1200 for validation, and 3600 for testing (80/5/15 split). The model achieved an accuracy of 99.98%, precision of 99.91%, recall of 99.92%, and F1-score of 99.93%, with an external clinician-audited macro F1 of 0.9993 and Cohen's Kappa of 0.985. Explainability assessment using Grad-CAM yielded a mean IoU of 0.982 ± 0.004 with 87.5% clinical interpretability confirmed by experts. The mobile deployment (TensorFlow Lite) achieved an inference latency of 180 ms/image and a total app footprint of ~ 150 MB, supporting real-time analysis on mid-range hardware. The proposed framework demonstrates that hybrid self-supervised pretraining with spatial refinement can achieve high diagnostic precision, strong clinical interpretability, and efficient mobile deployment, advancing the feasibility of AI-driven DFU screening in telemedicine and homecare environments.
    Keywords:  Clinical validation; Deep learning; Diabetic foot ulcer (DFU); Edge computing; EfficientNet; Explainable artificial intelligence (XAI); Grad-CAM; Mobile health; Self-supervised learning (SSL); SimCLR; U-Net
    DOI:  https://doi.org/10.1038/s41598-026-55510-z
  17. Health Informatics J. 2026 Jul-Sep;32(3):32(3): 14604582261464215
      Diabetic Retinopathy (DR) is a medical condition in which high blood sugar levels damage the retina's blood vessels. Existing Solutions for multi-class DR identification are computationally intensive and also suffer from low accuracy. There is an immense need for an automated, computationally efficient approach for monitoring DR progression in diabetic patients. The study proposed Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and Gaussian Naive Bayes (GNB) models for the classification of retinal images into five DR classes (No, Mild, Moderate, Proliferate, and Severe) using spatial features extracted through a Convolutional Neural Network (CNN), textural features extracted through a Grey Level Co-occurrence Matrix (GLCM), and hybrid features by combining these features. The CNN, EfficientNet, PyramidCNN, and Pyramid Vision Transformer (PVT) were also evaluated for the classification of DR stages. The results revealed that the RF model with hybrid features outperformed, with an accuracy of 98.00% and high performance across all evaluation metrics, with a 1.00% increase over existing approaches. The EfficientNet model also performs competitively with 97.00% accuracy. The ML models also emerged as computationally efficient in terms of training and inference time for deployment in low-resource clinical environments for automated monitoring of DR progression in diabetic patients.
    Keywords:  Convolutional Neural Network (CNN); Decision Tree (DT); Diabetic Retinopathy (DR); Gaussian Naive Bayes (GNB); Grey Level Co-occurrence Matrix (GLCM); Logistic regression (LR); Pyramid Vision Transformer (PVT); Random Forest (RF); efficientnet; hybrid features; pyramidCNN; spatial features; textural features
    DOI:  https://doi.org/10.1177/14604582261464215
  18. South Med J. 2026 Jul 02. 119(7): 347-351
       OBJECTIVE: To evaluate the diagnostic accuracy of Luminetics Core, a Food and Drug Administration-cleared artificial intelligence-based screening system, in the detection of diabetic retinopathy (DR) in a predominately non-White population and identify potential shortcomings in clinical implementation that should be addressed to mitigate disparities in health care.
    METHODS: Data were acquired via retrospective chart review and included 225 patients with a diagnosis of diabetes mellitus who were screened for DR using the LumineticsCore system (formerly IDx-DR) at the University Medical Center New Orleans (Louisiana). Metrics to assess DR detection efficacy included sensitivity, specificity, positive and negative predictive values, likelihood ratios, and indeterminate screening result rates. Stratified analyses regarding associated medical comorbidities also were performed. Clinic follow-up rates and time frames also were noted per screen result group.
    RESULTS: The study population had a diverse demographic profile, with 69.0% of subjects identifying as African American, 13.1% Hispanic, 10.7% White, 3.6% Asian, and 3.6% of subjects who either did not identify with the above racial/ethnic groups or declined to self-identify. The system yielded favorable performance measures in the study population regarding detection accuracy. It was found, however, that although most patients with a positive screen had ophthalmology referrals placed, 29.8% of patients with positive screen results did not attend their scheduled ophthalmology visit.
    CONCLUSIONS: The LumineticsCore system was found to be a reliable screening test for the detection of DR in the study population. The relatively high no-show rate for scheduled ophthalmology referrals in patients with positive screen results, however, sheds light on an implementation system issue in need of further evaluation.
    DOI:  https://doi.org/10.14423/SMJ.0000000000001990
  19. J Diabetes Investig. 2026 Jul 01.
       AIMS: The increasing prevalence of type 2 diabetes mellitus (T2DM) has led to an increase in diabetic kidney disease (DKD), which is presently a major cause of dialysis initiation. Accurately predicting renal function decline is clinically important. This study aimed to evaluate whether machine learning can predict renal outcomes based on one-year fluctuations in the estimated glomerular filtration rate (eGFR) in patients with T2DM. Unlike prior studies, which predicted renal decline only after the eGFR dropped below 60 mL/min/1.73 m2 and used pre-SGLT2 inhibitor data, we analyzed post-SGLT2 data and assessed the risk at any clinical time point.
    MATERIALS AND METHODS: We included outpatient T2DM cases with a mean eGFR ≥45 mL/min/1.73 m2. Using one year of retrospective data, we predicted ≥30% eGFR decline over the subsequent 3 years. Data were extracted semiannually, allowing multiple datasets for each patient.
    RESULTS: Among the 21,872 datasets, 7,216 met the criteria for analysis and 125 reached the primary endpoint. A baseline model using demographic and average laboratory data achieved an AUC of 0.77. When additional features were added, including smoothed mean proteinuria, glycosylated hemoglobin range, proteinuria intercept, variability indices (e.g., SDs of eGFR, creatinine, hemoglobin, and uric acid), regression coefficients, laboratory extremes, and medication duration, the area under the curve improved to 0.82.
    CONCLUSIONS: Our machine learning model accurately predicted the renal outcomes at any time point in patients with preserved renal function. Routinely collected clinical data may allow earlier identification of high-risk patients, in turn enabling timely therapeutic intervention.
    Keywords:  Diabetic Nephropathies; Machine Learning; Prognosis
    DOI:  https://doi.org/10.1111/jdi.70371
  20. Front Med (Lausanne). 2026 ;13 1806848
       Background: Gestational diabetes mellitus (GDM) is a prevalent pregnancy complication. Current diagnostic approaches are inherently retrospective, necessitating the development of effective early prediction models for timely intervention and improved outcomes.
    Objective: This study aimed to develop and validate a prediction model for GDM risk by integrating first-trimester clinical and metabolomic indicators.
    Methods: A retrospective cohort study was conducted involving 342 singleton pregnancies that received routine antenatal care and underwent mid-pregnancy oral glucose tolerance tests (OGTT) from January 2022 to December 2024. Participants were randomly allocated to a training set (n = 239) and a validation set (n = 103) in a 7:3 ratio. Core predictors were identified through a univariate analysis, LASSO regression, and subsequent multivariable logistic regression. Four machine learning models-Random Forest, Support Vector Machine (SVM), Gradient Boosting Machine, and Logistic Regression-were constructed and compared. Performance was evaluated by the area under the curve (AUC), calibration curves, and decision curve analysis. Model interpretability was assessed using SHapley Additive exPlanations (SHAP) values.
    Results: A multivariable analysis identified seven independent predictors: pre-pregnancy BMI, first-trimester fasting plasma glucose, triglycerides, C-reactive protein, and the branched-chain amino acid score (risk factors), as well as pregnancy-associated plasma protein-A and 1,5-anhydroglucitol (protective factors). In the validation set, the SVM model achieved optimal performance with an AUC of 0.861 (95% confidence interval (CI): 0.772-0.949). Calibration and decision curve analyses demonstrated good agreement between predicted and observed risks and affirmed clinical utility across a wide threshold probability range.
    Conclusion: A prediction model integrating first-trimester clinical and metabolomic markers was successfully developed and validated. The model demonstrates favorable predictive accuracy and clinical applicability, offering potential as an auxiliary tool for early risk stratification and personalized GDM management. Future multi-center external validation is warranted to confirm generalizability.
    Keywords:  early diagnosis; gestational diabetes mellitus; machine learning; metabolomics; nomogram; prediction model
    DOI:  https://doi.org/10.3389/fmed.2026.1806848
  21. J Diabetes Sci Technol. 2026 Jun 30. 19322968261464150
      
    Keywords:  artificial intelligence (AI); diabetes care; digital divide; digital health; digital management; health equity
    DOI:  https://doi.org/10.1177/19322968261464150
  22. JMIR AI. 2026 Jul 03. 5 e85248
       BACKGROUND: Screening for type 2 diabetes (T2D) is not optimal, leading to a large number of patients being undiagnosed. Recently, deep learning (DL) applied to chest radiographs (CXRs) has shown promise for opportunistic T2D prediction. A prior study in a predominantly suburban non-Hispanic White cohort achieved an area under the curve (AUC) of 0.84 for prevalence. In this study, we evaluate the performance and generalizability of this DL model in an urban cohort with greater racial diversity, higher social deprivation, and higher T2D prevalence. We further assess whether integrating DL predictions with BMI and demographic variables improves T2D prediction beyond demographics and BMI alone.
    OBJECTIVE: This study aims to externally validate a previously developed DL-based CXR model for T2D prediction in a diverse urban population, to assess its performance for both prevalent and incident T2D, and to determine whether combining DL predictions with demographics and BMI improves predictive performance.
    METHODS: We studied adults (2010-2020) from a tertiary academic medical center in Chicago with at least one ambulatory CXR. First, we performed external validation of a previously developed DL-CXR model by applying it directly to our cohort. Second, we evaluated whether combining the DL model output with additional data, demographics, BMI, and social deprivation index improved the performance. T2D prevalence was modeled using extreme gradient boosting, while incidence was assessed with Cox proportional hazards models. Model performance was compared using AUC and concordance, and feature contributions were evaluated using feature importance and odds ratios.
    RESULTS: Among 39,908 patients (n=21,311, 53.4% non-Hispanic Black; n=9179, 23% Latino; and n=5587, 14% non-Hispanic White), 26% (n=10,376) had T2D at their first CXR. The previously developed DL-T2D model maintained discrimination for prevalent T2D in this diverse urban cohort, with similar performance across racial groups (Latino: 0.818; non-Hispanic White: 0.819; non-Hispanic Black: 0.790), supporting generalizability. Adding DL output to demographics and BMI improved prediction compared with clinical variables alone (AUC 0.808 vs 0.766; P<.001). For a 3-year incident T2D, the full model achieved an AUC of 0.709 with concordance of 0.707; individuals in the highest risk quartile had a 7-fold higher incidence.
    CONCLUSIONS: In a diverse urban cohort, a previously developed DL model applied to CXRs provided significant incremental value beyond demographics and BMI for T2D risk prediction. Despite substantial differences in population characteristics compared with the derivation cohort, the DL model remained effective for T2D screening. Incidence prediction was less accurate than prevalence, highlighting the need for further refinement, potentially incorporating hemoglobin A1c when available. Although racial disparities in prevalence exist, predictive performance was comparable across groups. These findings support the generalizability of CXR-based DL for opportunistic T2D screening in diverse populations.
    Keywords:  chest x-rays; deep learning; multimodal machine learning; neural networks; risk assessment; risk prediction; type 2 diabetes
    DOI:  https://doi.org/10.2196/85248
  23. Front Cardiovasc Med. 2026 ;13 1801899
       Introduction: Carotid plaque is a critical risk factor for cardiovascular disease and reflects the extent of the atherosclerotic burden. Compared with non-diabetic individuals, patients with type 2 diabetes mellitus (T2DM) have an elevated likelihood of developing carotid plaques. Consequently, building predictive models tailored to this high-risk population holds significant clinical value for early prevention and management of cardiovascular events.
    Methods: A total of 2,288 patients were included in this study, 1,716 (75.0%) of whom had plaques detected using ultrasound. Baseline data, including demographic characteristics, medical history, and laboratory indicators, were collected, and seven machine-learning algorithms were applied to establish the prediction model. Feature importance was quantified and presented using Shapley Additive Explanations (SHAP). The performance of the model was evaluated using indicators such as area under the curve (AUC-ROC), sensitivity, and specificity.
    Results: The study showed that The logistic regression model performed the best in terms of discrimination, calibration, and clinical utility. Through SHAP interpretability analysis, key risk factors such as age, body mass index, glycated hemoglobin, history of hypertension, monocyte count, neutrophil percentage, red blood cell count, sex, estimated glomerular filtration rate, and statin use were identified. This study demonstrated that an effective risk-prediction model can be established using conventional clinical variables and machine learning.
    Conclusions: The developed predictive model can help primary care providers detect patients at heightened risk of carotid atherosclerotic plaques, enabling the delivery of targeted preventive strategies and ultimately improving clinical outcomes.
    Keywords:  SHAP; carotid plaque; machine learning; primary care; type 2 diabetes mellitus
    DOI:  https://doi.org/10.3389/fcvm.2026.1801899
  24. ArXiv. 2026 Jun 25. pii: arXiv:2606.18640v2. [Epub ahead of print]
      Glucose forecasting algorithms are an important aspect of glycemic control management in type 1 diabetes. So far, the research community has developed numerous algorithms and models for forecasting. However, it is well-recognized that the lack of standardized model performance evaluation benchmarks makes fair comparison difficult and hinders further innovation, and thus benchmark standardization is in urgent need. Furthermore, many published glucose forecasting algorithms are limited to CGM data alone, ignoring other multimodal signals such as insulin dosing and carbohydrate intake. Here, we introduce MetaboNet-Bench, a benchmark for multimodal glucose forecasting for patients with type 1 diabetes that provides an extensible open-source evaluation framework for comparison of glucose forecasting algorithms that leverage glucose, insulin, and carbohydrate data. We then demonstrate its utility by benchmarking several recently published glucose forecasting models and a custom multimodal time-series model, representing different model architectures. The results show that the benefit of adding data modalities is conditioned on the complexity of the model and that incorporating more clinical metrics helps identify meaningful gaps to fill for future research.
  25. BJS Open. 2026 Jul 03. pii: zrag054. [Epub ahead of print]10(4):
       BACKGROUND: Predicting postoperative body mass index (BMI) trajectories and long-term type 2 diabetes (T2D) remission after bariatric surgery remains challenging. Existing models often rely on baseline variables only and fail to incorporate dynamic postoperative changes. This study aimed to develop and validate a multicentre machine-learning framework that predicts individualized BMI trajectories and T2D remission using routinely available preoperative data and time-dependent weight evolution.
    METHODS: This multicentre retrospective cohort study included adult patients who underwent Roux-en-Y gastric bypass or sleeve gastrectomy across 11 European centres (2012-2023). Variables with > 30% missing data were excluded; remaining missing values were imputed iteratively. A two-stage approach was used: a regression model predicting postoperative BMI at 3-60 months using an autoregressive design; and a classification model predicting T2D remission using baseline features and predicted BMI trajectories. Internal performance was evaluated with ten-fold and leave-one-clinic-out cross-validation; external validation used an independent cohort from Linköping, Sweden.
    RESULTS: Of the 11 457 patients initially identified, 9652 patients with complete baseline and follow-up information were used for the analysis. The best BMI model (HistGradientBoosting) achieved a root mean square error (RMSE) of 1.11 kg/m2 (95% confidence interval 1.07 to 1.14) and a mean absolute error (MAE) of 0.62 kg/m2 across clinics; external testing showed an RMSE of 1.12 kg/m2 (95% confidence interval 1.11 to 1.12) and an MAE of 0.63 kg/m2. The T2D remission classifier (XGBoost) obtained a Macro F1 score of 0.88 (precision 0.87, recall 0.88), with an external F1 score of 0.89. Incorporating predicted BMI trajectories improved discrimination compared with baseline-only models (C-index 0.95 versus 0.93).
    CONCLUSION: A two-stage machine-learning framework has high predictive performance for postoperative BMI and T2D remission up to 5 years after bariatric surgery. Dynamic incorporation of predicted weight trajectories enhances metabolic risk prediction and supports individualized counselling and postoperative management.
    Keywords:  diabetes remission; machine learning; postoperative weight loss; risk prediction
    DOI:  https://doi.org/10.1093/bjsopen/zrag054
  26. J Diabetes Investig. 2026 Jun 30.
       AIMS/INTRODUCTION: Gestational diabetes mellitus (GDM) is one of the most frequent pregnancy complications. Investigating clinical risk factors for adverse pregnancy outcomes in women with GDM would help predict and prevent neonatal complications. We developed a machine learning model to discover risk factors for adverse pregnancy outcomes.
    MATERIALS AND METHODS: Women with GDM from tertiary hospitals in Korea were included (n = 305, discovery cohort; n = 911, validation cohort). Supervised machine learning classification models, including ExtraTree, RandomForest, GradientBoosting, AdaBoost, Bagging, XGBoost, and Light Gradient Boosting Machine (LGBM), were developed to predict adverse pregnancy outcomes. Outcomes included large for gestational age (LGA), small for gestational age (SGA), low Apgar score, and preterm delivery. The top-ranked risk factors identified through feature importance were further validated using binary logistic regression analysis.
    RESULTS: In predicting LGA, the RandomForest model achieved the highest AUROC of 0.726 on the validation cohort. For SGA, the RandomForest model achieved the highest AUROC of 0.628. For low Apgar score and preterm delivery, the ExtraTree model showed the best performance with AUROCs of 0.689 and 0.616, respectively.
    CONCLUSIONS: This study presents machine learning models as a foundational tool for identifying factors associated with adverse pregnancy outcomes in women with GDM. These models may serve as a foundation for future development of clinical decision support tools.
    Keywords:  Adverse birth outcomes; Diabetes, gestational; Machine learning
    DOI:  https://doi.org/10.1111/jdi.70364
  27. Digit Health. 2026 Jan-Dec;12:12 20552076261461391
       Objective: To determine which data augmentation technique yields the best performance for deep learning models in classifying age-related macular degeneration (AMD), diabetic retinopathy (DR), glaucoma, and normal fundus images.
    Methods: This study employed an in silico experimental study design. Six data augmentation techniques: Colour Jitter, Contrast-Limited Adaptive Histogram Equalisation (CLAHE), Rotation, Translation, Gaussian Noise, and Poisson Noise were evaluated using controlled experiments with an EfficientNet-B0 model on a balanced dataset of 1,200 fundus photographs, 250 cases each for AMD, DR and glaucoma, and 450 normal fundus images curated from four main publicly available databases. The experiments were conducted in four phases: baseline, single augmentations, combined augmentations, and the impact of augmented dataset volume. Evaluation metrics and visualisations were computed with Python-based statistical and visualisation libraries.
    Results: The results from this study show that data augmentation consistently increased the area under the curve (AUC) from 96.55% to 97.23% and accuracy from 85.83% (baseline) to 89.58%. The results indicate that augmentation effectiveness is disease-specific: Rotation and Colour Jitter yielded the highest sensitivity for AMD (99%), CLAHE maximised sensitivity for Diabetic Retinopathy (96%), and Translation was most effective for Glaucoma (83%). While single augmentations provided descriptive clinical improvements, the comprehensive combination of photometric, geometric, and noise augmentations yielded the best overall performance and achieved a statistically significant improvement over the baseline (Mean bootstrapped AUC = 0.9800, 95% CI: 0.9678, 0.9895; p= 0.0050).
    Conclusion: Data augmentation effectiveness is disease-dependent; specific pathologies respond better to distinct augmentation techniques due to different retinal biomarkers.
    Keywords:  age-related macular degeneration; data augmentation; deep learning; diabetic retinopathy; glaucoma
    DOI:  https://doi.org/10.1177/20552076261461391
  28. Digit Health. 2026 Jan-Dec;12:12 20552076261450827
       Background: With the increasingly widespread application of artificial intelligence technology, generative artificial intelligence has become an important tool for people to obtain health information due to its convenience and flexibility in health education or health promotion. However, the readability and accuracy of such AI-generated materials still need to be evaluated.
    Objective: To comprehensively evaluate and compare the quality and readability of health education texts about diabetes generated by different generative artificial intelligence (AI) models.
    Methods: We followed a fixed list of ten questions without modifications, systematically presenting the same inquiries to seven generative AI models and exporting their results into defined forms in the text generation process. Five experts were invited to evaluate the texts based on five criteria. The readability index, a readability formula, was used to evaluate the text's readability. Kendall's coefficient of concordance was employed to assess inter-rater reliability. The linear mixed model was used to compare the differences in five dimensions and readability among the health education texts generated by different AI models.
    Results: Kimi-K1.5 and Doubao attained the highest overall scores in scientific accuracy, whereas iFlytek Spark-V3.5 received lower scores compared to other models. In terms of practical value and logical clarity, Kimi-K1.5 received the highest scores, while iFlytek Spark-V3.5 scored the lowest. In the dimension of reference basis, Kimi-K1.5 and ERNIE Bot-3.5 received relatively high scores, while iFlytek Spark-V3.5 and Doubao scored lower. In the assessment of text readability, higher R-value scores indicate poorer readability. The health education text generated by Doubao had the highest R-value, while iFlytek Spark-V3.5 had the lowest R-value.
    Conclusions: Kimi-K1.5 performed better across multiple assessment parameters in the overall evaluation of diabetes-related health education texts created by different generative AI models. Notably, among all the models tested, iFlytek Spark-V3.5 showed the best readability.
    Keywords:  diabetes; generative artificial intelligence model; health education texts
    DOI:  https://doi.org/10.1177/20552076261450827