J Diabetes Metab Disord. 2025 Jun;24(1): 115
Background and objectives: Social determinants of health (SDOH) play a critical role in the onset and progression of chronic kidney disease (CKD). Despite the well-established role of SDOH, previous studies have not fully incorporated these factors in predicting CKD in Type 2 diabetes patients. To bridge this gap, this study aimed to develop and evaluate the machine learning (ML) models that incorporate SDOH to enhance CKD risk prediction in Type 2 diabetes patients.
Methods: Data were obtained from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), a national survey that collects comprehensive health-related data from adults across the United States. Missing data were addressed using the K-nearest neighbor imputation method, and the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance class distributions. Potential predictive features were selected using correlation coefficient analysis. The dataset was partitioned into training (80%) and testing (20%) subsets, with a 3-fold cross-validation strategy applied to the training data. Seven ML models were developed for CKD risk prediction, including logistic regression (LR), decision tree (DT), K-nearest neighbor (KNN), random forest (RF), eXtreme Gradient Boosting (XGBoost), and an artificial neural network (ANN). Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), precision, recall, F1 score, accuracy, and false positive rate.
Results: The study included 19,912 Type 2 diabetes patients (weighted sample size: 818,878), among whom 2,924 (weighted 13.92%) had CKD, and 16,988 (weighted 86.08%) did not. Over half of the CKD group (50.4%) were aged 65 or older. The proportion of female patients was higher in both groups, comprising 53.8% of the CKD group and 50.5% of the non-CKD group. Among the ML models evaluated, the RF model demonstrated the highest predictive performance for CKD, with an AUROC of 0.89 (95% CI: 0.88 - 0.90), followed by the DT model (0.84, 95% CI: 0.83 - 0.85) and XGBoost (0.83, 95% CI: 0.82 - 0.84). The RF model achieved an accuracy of 0.81 (95%CI: 0.81 - 0.81), a precision of 0.79 (95%CI: 0.79 - 0.79), a recall of 0.85 (95%CI: 0.85 - 0.85), and an F1 score of 0.82 (95%CI: 0.82 - 0.82). Additionally, the RF model exhibited strong calibration, reinforcing its reliability as a predictive tool for CKD risk in individuals with Type 2 diabetes.
Conclusion: The study findings underscore the potential of ML models, particularly the RF model, in accurately predicting CKD among individuals with Type 2 diabetes. This approach not only enhances the precision of CKD prediction but also highlights the importance of addressing social and environmental disparities in disease prevention and management. Leveraging ML models with SDOH can lead to earlier interventions, more personalized treatment plans, and improved health outcomes for vulnerable populations.
Supplementary Information: The online version contains supplementary material available at 10.1007/s40200-025-01621-9.
Keywords: Chronic kidney disease; Machine learning; Random forest; Social determinants of health; Type 2 diabetes; XGBoost