Front Endocrinol (Lausanne). 2026 ;17
1852512
Background: Interstitial fibrosis and tubular atrophy (IFTA) are key pathological features of chronic kidney damage and progression in diabetic nephropathy (DN). Early identification of patients at higher risk of IFTA may support risk stratification, although reliable non-invasive tools remain limited. This study aimed to develop and validate machine learning (ML) models for predicting IFTA in patients with biopsy-confirmed DN.
Methods: In this retrospective study, 232 patients with biopsy-confirmed DN from 2017 to 2025 were included and randomly divided into a training cohort (n = 164) and a validation cohort (n = 68). Baseline clinical and laboratory variables were collected. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation. Seven ML algorithms-logistic regression, support vector machine, random forest, XGBoost, LightGBM, decision tree, and artificial neural network-were developed. Model performance was evaluated using receiver operating characteristic curves, calibration plots, and decision curve analysis. Model interpretability was assessed using SHAP.
Results: Seven predictors were identified, including diabetic retinopathy, age, proteinuria, estimated glomerular filtration rate (eGFR), triglycerides, duration of diabetes, and hemoglobin. Among the models, XGBoost achieved the highest AUC in the validation cohort, with an area under the curve (AUC) of 0.759, accuracy of 72.1%, sensitivity of 92.3%, specificity of 44.8%, and F1 score of 79.1%. Overall, the model showed moderate discrimination, with high sensitivity but limited specificity, suggesting potential value for exploratory risk screening rather than definitive clinical use. SHAP analysis indicated that higher proteinuria, triglycerides, presence of diabetic retinopathy, and longer diabetes duration, together with lower eGFR, hemoglobin, and younger age, were associated with an increased predicted risk of IFTA.
Conclusion: ML models, particularly XGBoost, showed moderate performance in predicting IFTA in patients with biopsy-confirmed DN using routinely available clinical variables. These findings support the feasibility of an interpretable, non-invasive approach for exploratory risk estimation of tubulointerstitial injury. However, because of the modest sample size, limited specificity, relatively high false positive rate, and lack of external validation, the present results should be considered preliminary and require further validation before clinical use.
Keywords: SHAP; diabetic nephropathy; interstitial fibrosis; machine learning; tubular atrophy