Early Stage Diabetes Prediction using Machine Learning with Hyperparameter Tuning GridSearchCV
DOI:
https://doi.org/10.65780/bima.v1i3.15Keywords:
Diabetes Prediction, Machine Learning, Hyperparameter Tuning, Random Forest, XGBoost, LightGBMAbstract
This study evaluates the performance of ensemble-based machine learning models for early-stage diabetes prediction. Three classifiers Random Forest, XGBoost, and LightGBM were assessed under baseline and hyperparameter-tuned configurations using an 80–20 train–test split. Model performance was measured using accuracy, precision, recall, and F1-score. The results show that all models achieved high predictive performance, with test accuracy reaching up to 99.04%. Random Forest demonstrated stable and consistent results without significant improvement after tuning. XGBoost showed performance enhancement after hyperparameter optimization, improving its generalization ability. LightGBM achieved competitive baseline performance but experienced a slight decrease after tuning. Learning curve analysis indicates that all models benefit from increased training data, with reduced overfitting as dataset size grows. Overall, Random Forest and tuned XGBoost emerged as the most reliable models for early-stage diabetes prediction, demonstrating strong generalization and high classification accuracy.
Downloads
References
[1] S. Alam and M. Aijaz, “COMPLICATIONS OF CARDIOVASCULAR DISEASE: THE IMPACT OF DIABETES, DYSLIPIDEMIA, AND METABOLIC DISORDERS,” vol. 13, no. 21, 2015.
[2] S. Nakka, “Diabetes Prediction Using Machine Learning,” Apr. 18, 2025, Computer Science and Mathematics. doi: 10.20944/preprints202504.1586.v1. DOI: https://doi.org/10.20944/preprints202504.1586.v1
[3] B. Ofori, S. Twum, S. Nkansah Yeboah, F. Ansah, and K. Amofa Nketia Sarpong, “Towards the development of cost-effective point-of-care diagnostic tools for poverty-related infectious diseases in sub-Saharan Africa,” PeerJ, vol. 12, p. e17198, Jun. 2024, doi: 10.7717/peerj.17198. DOI: https://doi.org/10.7717/peerj.17198
[4] T. N. Poly, M. M. Islam, and Y.-C. (Jack) Li, “Early Diabetes Prediction: A Comparative Study Using Machine Learning Techniques,” in Studies in Health Technology and Informatics, J. Mantas, P. Gallos, E. Zoulias, A. Hasman, M. S. Househ, M. Diomidous, J. Liaskos, and M. Charalampidou, Eds., IOS Press, 2022. doi: 10.3233/SHTI220752. DOI: https://doi.org/10.3233/SHTI220752
[5] A. Dutta et al., “Early Prediction of Diabetes Using an Ensemble of Machine Learning Models,” Int. J. Environ. Res. Public. Health, vol. 19, no. 19, p. 12378, Sep. 2022, doi: 10.3390/ijerph191912378.
[6] A. Dutta et al., “Early Prediction of Diabetes Using an Ensemble of Machine Learning Models,” Int. J. Environ. Res. Public. Health, vol. 19, no. 19, p. 12378, Sep. 2022, doi: 10.3390/ijerph191912378. DOI: https://doi.org/10.3390/ijerph191912378
[7] Z. M. Alhakeem, Y. M. Jebur, S. N. Henedy, H. Imran, L. F. A. Bernardo, and H. M. Hussein, “Prediction of Ecofriendly Concrete Compressive Strength Using Gradient Boosting Regression Tree Combined with GridSearchCV Hyperparameter-Optimization Techniques,” Materials, vol. 15, no. 21, p. 7432, Oct. 2022, doi: 10.3390/ma15217432. DOI: https://doi.org/10.3390/ma15217432
[8] Md. A. Rahman, L. F. Abdulrazak, Md. M. Ali, I. Mahmud, K. Ahmed, and F. M. Bui, “Machine Learning-Based Approach for Predicting Diabetes Employing Socio-Demographic Characteristics,” Algorithms, vol. 16, no. 11, p. 503, Oct. 2023, doi: 10.3390/a16110503. DOI: https://doi.org/10.3390/a16110503
[9] S. Gündoğdu, “Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique,” Multimed. Tools Appl., vol. 82, no. 22, pp. 34163–34181, Sep. 2023, doi: 10.1007/s11042-023-15165-8. DOI: https://doi.org/10.1007/s11042-023-15165-8
[10] T. H. Pramudita and M. Z. Arifin, “Hyperparameter Optimization of Light Gradient Boosting Machine for Microcirculation Detection Wearable Data: Optimasi Hyperparameter Light Gradient Boosting Machine untuk Deteksi Mikrosirkulasi Data Wearable,” Indones. J. Innov. Stud., vol. 27, no. 1, Jan. 2026, doi: 10.21070/ijins.v27i1.1888. DOI: https://doi.org/10.21070/ijins.v27i1.1888
[11] T. A. A. Abdullah, M. S. M. Zahid, and W. Ali, “A Review of Interpretable ML in Healthcare: Taxonomy, Applications, Challenges, and Future Directions,” Symmetry, vol. 13, no. 12, p. 2439, Dec. 2021, doi: 10.3390/sym13122439. DOI: https://doi.org/10.3390/sym13122439
[12] U. Allani, “Interactive Diabetes Risk Prediction Using Explainable Machine Learning: A Dash-Based Approach with SHAP, LIME, and Comorbidity Insights”.
[13] R. M. Munshi et al., “Optimising hyperparameters with a tree structured Parzen estimator to improve diabetes prediction,” Sci. Rep., vol. 15, no. 1, p. 35430, Oct. 2025, doi: 10.1038/s41598-025-19295-x. DOI: https://doi.org/10.1038/s41598-025-19295-x
[14] H. Shao, X. Liu, D. Zong, and Q. Song, “Optimization of diabetes prediction methods based on combinatorial balancing algorithm,” Nutr. Diabetes, vol. 14, no. 1, p. 63, Aug. 2024, doi: 10.1038/s41387-024-00324-z. DOI: https://doi.org/10.1038/s41387-024-00324-z
[15] H. M. Deberneh and I. Kim, “Prediction of Type 2 Diabetes Based on Machine Learning Algorithm,” Int. J. Environ. Res. Public. Health, vol. 18, no. 6, p. 3317, Mar. 2021, doi: 10.3390/ijerph18063317. DOI: https://doi.org/10.3390/ijerph18063317


