Cureus. 2025 Mar;17(3): e80933
INTRODUCTION: With its rising prevalence and serious complications, type 2 diabetes mellitus (T2DM) is a major worldwide health burden that calls for early detection using non-invasive screening techniques. Existing screening techniques, including OGTT, HbA1c, and fasting plasma glucose, have drawbacks in terms of accessibility, expense, and invasiveness. Recent developments in heart rate variability (HRV) analysis and machine learning (ML) offer a possible non-invasive substitute for diabetes screening. Previous research on HRV-based ML models in the classification of diabetes has issues with generalizability. The objective of this study is to develop and validate ML models using HRV features: time-domain, frequency-domain, and nonlinear HRV, to improve the prediction of T2DM. The study also evaluates the developed ML model's effectiveness against existing ML models.
METHOD: A retrospective dataset comprising 519 individuals (261 T2DM patients and 258 non-diabetic controls) was collected from the Autonomic Function Testing (AFT) laboratory repositories. To ensure comparability of age, gender, height, and weight among groups, post-hoc matching was used. HRV features were extracted from five-minute ECG recordings using the PowerLab data acquisition system and LabChart HRV module (ADInstruments, Sydney, Australia), following the European Society of Cardiology Task Force guidelines. An 80:20 train-test split was used to train and assess ML models, such as Logistic Regression, K-Nearest Neighbors (KNNs), Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost, and AdaBoost. Accuracy, precision, recall, F1-score, area under the curve (AUC) for the receiver operating characteristic (ROC), sensitivity, and specificity were among the performance indicators. GridSearchCV was used for hyperparameter adjustment to maximize model performance.
RESULTS: The baseline characteristics of the non-diabetic and T2DM groups were similar (p>0.05). HRV analysis showed substantial decreases in the diabetic group's time-domain (SDNN - SD of Normal-to-Normal Intervals/RMSSD - RMS of Successive Differences), frequency-domain (Low/High Frequency - LF/HF), and nonlinear (SD2 - SD of Poincaré Plot/CVRR - Coefficient of Variation of R-R Intervals) parameters (p<0.001). With a 91.2% accuracy rate and an AUC of 0.91, CatBoost outperformed other ML models in terms of prediction. LightGBM and Random Forest, which demonstrated high sensitivity and specificity, trailed closely behind. KNN achieved the highest accuracy (98.2%) and AUC (0.99), followed by Random Forest (96.4%) and CatBoost (94.5%), while hyperparameter modification further enhanced performance. CatBoost demonstrated the highest predictive performance, with an accuracy of 91.2% and an AUC of 0.91. According to correlation analysis, the most important HRV characteristics for diabetes prediction were SD2, SDRR (SD of R-R Intervals), and CVRR.
CONCLUSION: This study validates the utility of HRV-based ML models for non-invasive T2DM prediction, with ensemble models like CatBoost and LightGBM demonstrating superior performance when compared to the results of prior ML models. The optimized ML model, integrated with wearable medical technology for real-time monitoring, offers a scalable, affordable, and non-invasive alternative for diabetes screening. To improve generalizability and clinical use, future studies should investigate wearable-based HRV monitoring, multimodal AI models, and longitudinal validation.
Keywords: cardiac autonomic neuropathy; diabetes mellitus; heart rate variability; machine learning; non-invasive screening