Explainable Machine Learning For Early HIV Detection Using Extra Trees and SHAP Algorithms
DOI:
https://doi.org/10.65780/bima.v1i2.8Keywords:
HIV; early detection; Extra Trees; Explainable Machine Learning; SHAPAbstract
Human Immunodeficiency Virus (HIV) remains a global health challenge that requires accurate and reliable early detection approaches. The use of machine learning offers potential in classifying HIV status based on clinical, demographic, and behavioral data. However, the limitations of interpretability in black-box models are an obstacle to clinical application. This study proposes an Explainable Machine Learning approach for early HIV detection by integrating the Extra Trees algorithm and the Shapley Additive exPlanations (SHAP) method. The model was developed using an HIV dataset obtained from the Kaggle platform and processed through standard data preprocessing stages without class balancing. Performance evaluation was conducted using classification metrics, confusion matrices, and learning curves to assess accuracy and learning stability. The results of the experiment show that the Extra Trees model achieved 88% accuracy with strong generalization. SHAP and mean absolute SHAP analyses revealed the dominant features that contributed to the prediction of HIV status consistently at the global and local levels. These findings show that integrating Extra Trees and SHAP produces an HIV early-detection model that is not only competitive in performance but also transparent and clinically relevant, potentially supporting the development of reliable artificial intelligence-based medical decision support systems.
Downloads
References
[1] “Global HIV & AIDS statistics — Fact sheet,” UNAIDS, [Online]. Available: https://www.unaids.org/en/resources/fact-sheet
[2] “Long-Term Benefits from Early Antiretroviral Therapy Initiation in HIV Infection,” NEJM Evidence, vol. 2, no. 3, Feb. 2023, doi: 10.1056/EVIDoa2200302.
[3] Y. Xiang, J. Du, K. Fujimoto, F. Li, J. Schneider, and C. Tao, “Application of artificial intelligence and machine learning for HIV prevention interventions,” Jan. 01, 2022, Elsevier Ltd. doi: 10.1016/S2352-3018(21)00247-2.
[4] S. U. Nisa, A. Mahmood, F. S. Ujager, and M. Malik, “HIV/AIDS predictive model using random forest based on socio-demographical, biological and behavioral data,” Egyptian Informatics Journal, vol. 24, no. 1, pp. 107–115, Mar. 2023, doi: 10.1016/j.eij.2022.12.005.
[5] N. Khan, M. Nauman, A. S. Almadhor, N. Akhtar, A. Alghuried, and A. Alhudhaif, “Guaranteeing Correctness in Black-Box Machine Learning: A Fusion of Explainable AI and Formal Methods for Healthcare Decision-Making,” IEEE Access, vol. 12, pp. 90299–90316, 2024, doi: 10.1109/ACCESS.2024.3420415.
[6] R. ElShawi, Y. Sherif, M. Al-Mallah, and S. Sakr, “Interpretability in healthcare: A comparative study of local machine learning interpretability techniques,” Comput. Intell., vol. 37, no. 4, pp. 1633–1650, Nov. 2021, doi: 10.1111/coin.12410.
[7] L. Famiglini, A. Campagner, M. Barandas, G. A. La Maida, E. Gallazzi, and F. Cabitza, “Evidence-based XAI: An empirical approach to design more effective and explainable decision support systems,” Comput. Biol. Med., vol. 170, Mar. 2024, doi: 10.1016/j.compbiomed.2024.108042.
[8] R. Zhou and T. Hu, “Evolutionary approaches to explainable machine learning,” Jun. 2023, doi: 10.1007/978-981-99-3814-8_16.
[9] M. Yousefi, V. Oskoei, H. R. Esmaeli, and M. Baziar, “An innovative combination of extra trees within adaboost for accurate prediction of agricultural water quality indices,” Results in Engineering, vol. 24, Dec. 2024, doi: 10.1016/j.rineng.2024.103534.
[10] A. Mizwar, A. Rahim, P. Hartato, A. Ridwan, and F. Asharudin, “Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[11] R. Morales-Sánchez, S. Montalvo, A. Riaño, R. Martínez, and M. Velasco, “Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes,” Comput. Biol. Med., vol. 179, Sep. 2024, doi: 10.1016/j.compbiomed.2024.108830.
[12] S. M. Lundberg et al., “From Local Explanations to Global Understanding with Explainable AI for Trees,” Nature Machine Intelligence, 2020.
[13] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python—Tree-Based Ensemble Updates,” Journal of Machine Learning Research, pembaruan praktik ensemble pohon, 2021–2023


