Healthcare (Basel). 2026 Jun 15. pii: 1710. [Epub ahead of print]14(12):
Background/Objectives: Type 2 diabetes mellitus (T2DM) is a prevalent metabolic disorder associated with substantial long-term morbidity and mortality. Routinely collected anthropometric, biochemical, and hematological variables may contain useful discriminatory information for data-driven classification. This study aimed to compare the apparent classification performance of multiple machine learning algorithms for distinguishing individuals with and without T2DM using routinely obtained clinical parameters in a single-center dataset. Methods: This single-center observational study included 160 adults (95 females, 65 males) evaluated at the Endocrinology Outpatient Clinic of Gaziantep Islam Science and Technology University, Faculty of Medicine, Ersin Arslan Training and Research Hospital. The dataset comprised anthropometric measurements, biochemical markers, and complete blood count parameters. SMOTE was applied only within the training folds to address class imbalance and to avoid information leakage. Following fold-internal data preprocessing, which included imputing missing values and feature standardization where appropriate, the dataset was evaluated using stratified 5-fold cross-validation. SHAP analysis was performed to interpret the model predictions. A calibration curve was used to assess the model's reliability. Eight supervised machine learning models were evaluated with and without HbA1c: Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Decision Tree, Random Forest, Extra Trees, Gaussian Naive Bayes, and k-Nearest Neighbors. Model performance was evaluated using accuracy, sensitivity, specificity, and F1 score, and ROC curves were used as a diagnostic tool. Results: The models were evaluated in two different ways: with and without HbA1c. Random Forest demonstrated the best classification performance in the cross-validated evaluation; without HbA1c, it achieved 92.2% accuracy, 93.9% sensitivity, 97.9% specificity, and a 95.9% F1 score. When HbA1c was included, it achieved 98.0% accuracy, 97.9% sensitivity, 98.8% specificity, and a 99.0% F1 score. Decision Tree and Extra Trees demonstrated strong performance with accuracy rates of 87.6% and 92.8%, respectively, without HbA1c, and 90% and 93.5% when HbA1c was included; in contrast, KNN yielded the lowest accuracy rate (70.6%). Overall, tree-based models performed better than linear classifiers on this dataset. Conclusions: Machine learning models based on routine clinical and anthropometric variables demonstrated promising performance for T2DM classification in this single-center dataset; tree-based approaches yielded the most promising results. Including HbA1c improved the models' ability to classify individuals with and without T2DM. However, since HbA1c was included both as a predictor and as part of the operational definition of the diabetes group, the findings should be interpreted with caution due to the risk of target leakage. Therefore, these results should be considered exploratory rather than evidence of clinically applicable predictive performance, and an independent external validation study should be conducted prior to clinical application.
Keywords: HbA1c; artificial intelligence; complete blood count; decision tree; machine learning; random forest; type 2 diabetes mellitus