Sci Rep. 2025 Nov 05. 15(1): 38720
Diabetes mellitus is a major global health burden, and early identification of insulin dependency is important for timely intervention. This study developed an artificial intelligence-based diagnostic system using a real-world clinical dataset of 100 anonymized patient records, collected with ethical approval and informed consent. The dataset included demographic, lifestyle, and biochemical variables such as glycated hemoglobin (HbA1c), fasting blood sugar (FBS), and postprandial blood sugar (PPBS). After preprocessing to handle missing values, normalize continuous variables, and encode categorical features, four machine learning models were implemented: Logistic Regression, Random Forest, XGBoost, and LightGBM, along with ensemble based, combined approaches. Model evaluation was performed using 5-fold cross-validation with accuracy, precision, recall, and F1-score as metrics. XGBoost achieved the highest performance (accuracy 0.88, precision 0.86, recall 0.90, F1-score 0.88), followed by LightGBM (accuracy 0.85, F1-score 0.84), Random Forest (accuracy 0.82, F1-score 0.81), and Logistic Regression (accuracy 0.76, F1-score 0.74). The most predictive features were PPBS and HbA1c, consistent with clinical understanding. While results are promising, they reflect a single-center dataset of 100 records, and should be interpreted as preliminary, further study will include external validation on larger, multi-site cohorts prior to clinical adoption.
Keywords: Diabetes; Ensemble learning; Insulin dependency; LightGBM; Machine learning; Predictive modeling