JMIR Med Inform. 2025 Oct 10. 13 e71994
Veerle Y van Velze,
Hendrico L Burger,
Tim J van der Steenhoven,
Hani Al-Ers,
Lauren N Goncalves,
Daniël Eefting,
Willem-Jan J de Jong,
Harm J Smeets,
Janna C Specken Welleweerd,
Joost R van der Vorst,
Sandy Uchtmann,
Robert Rissmann,
Jaap F Hamming,
Lampros Stergioulas,
Koen Ea van der Bogt.
BACKGROUND: Machine learning (ML) has shown great potential in recognizing complex disease patterns and supporting clinical decision-making. Diabetic foot ulcers (DFUs) represent a significant multifactorial medical problem with high incidence and severe outcomes, providing an ideal example for a comprehensive framework that encompasses all essential steps for implementing ML in a clinically relevant fashion.
OBJECTIVE: This paper aims to provide a framework for the proper use of ML algorithms to predict clinical outcomes of multifactorial diseases and their treatments.
METHODS: The comparison of ML models was performed on a DFU dataset. The selection of patient characteristics associated with wound healing was based on outcomes of statistical tests, that is, ANOVA and chi-square test, and validated on expert recommendations. Imputation and balancing of patient records were performed with MIDAS (Multiple Imputation with Denoising Autoencoders) Touch and adaptive synthetic sampling, respectively. Logistic regression, support vector machine (SVM), k-nearest neighbors, random forest (RF), extreme gradient boosting (XGBoost), Bayesian additive regression trees, and artificial neural network were trained, cross-validated, and optimized using random sampling on the patient dataset. To evaluate model calibration and clinical utility, calibration curves, Brier scores, and decision curve analysis (DCA) were performed.
RESULTS: The exploratory dataset consisted of 700 patient records with 199 variables. After dataset cleaning, the variables used for model training included age, smoking status, toe systolic pressure, blood pressure, oxygen saturation, hemoglobin, hemoglobin A1c, estimated glomerular filtration rate, wound location, diabetes type, Texas wound classification, neuropathy, and wound area measurement. The SVM obtained a stable accuracy of 0.853 (95% CI 0.810-0.896) with an area under the receiver operating characteristic curve of 0.922 (95% CI 0.889-0.955). The RF and XGBoost acquired an accuracy of 0.838 (95% CI 0.793-0.883) and 0.815 (95% CI 0.768-0.862), respectively, with areas under the receiver operating characteristic curve of 0.917 (95% CI 0.883-0.951) for RF and 0.889 (95% CI 0.849-0.929) for XGBoost. SVM, RF, and XGBoost were well-calibrated, with average Brier scores around 0.127 (SD 0.013). DCA showed that the SVM provided the highest net clinical benefit across relevant risk thresholds.
CONCLUSIONS: Handling missing values, feature selection, and addressing class imbalance are critical components of the key steps in developing ML applications for clinical research. Seven models were selected for comparing their predictive power regarding complete wound healing, and each model representing a different branch in ML. In this initial DFU dataset used as an example, the SVM achieved the best performance in predicting clinical outcomes, followed by RF and XGBoost. The model's calibration and clinical utility were determined through calibration curves, Brier scores, and DCA, demonstrating its potential relevance in clinical decision-making.
Keywords: Bayesian additive regression trees; artificial neural network; complete wound healing; diabetic foot ulcer; extreme gradient boosting; k-nearest neighbor; logistic regression; machine learning; random forest; support vector machine