Ensemble Learning for Early Warning Systems in Higher Education: A Comparative Study of Student Attrition
DOI:
https://doi.org/10.65780/bima.v1i3.19Keywords:
Ensemble Learning, Student Attrition, Early Warning System, Higher Education, Machine LearningAbstract
Student attrition poses a substantial challenge to higher education institutions, affecting their reputation and financial sustainability. Conventional single machine learning models often exhibit limited sensitivity when analyzing educational data, which is typically marked by severe class imbalance favoring graduating students over dropouts. This study introduces an Early Warning System based on a Hybrid Stacking Ensemble framework to improve student attrition prediction. The approach leverages complementary biases from Bagging and Boosting as base learners, which are then combined using a Logistic Regression meta-learner to refine prediction weights. To counteract class imbalance and majority-class bias, the Synthetic Minority Over-sampling Technique was employed during preprocessing. Empirical evaluations reveal that the Hybrid Stacking Ensemble attains a classification accuracy of 88.81% and a Recall of 80.99%, surpassing standalone models and other ensemble methods. Feature importance rankings highlight second-semester academic performance and administrative-financial factors—particularly tuition payment punctuality—as key dropout predictors. These results affirm the value of integrating diverse classifiers to discern intricate, nonlinear student behavior patterns. In essence, this work establishes a reliable, evidence-based framework enabling administrators to shift from reactive to proactive, precision-targeted strategies that foster student retention and institutional success.
Downloads
References
[1] J. Alani, B. L. Yawe, and J. Mutenyo, vol. 10, no. 1, Dec. 2022, doi: 10.58653/nche.v10i1.
[2] T. H. T. Nguyen, “Higher Education and Its Role for National Development. A Research Agenda with Bibliometric Analysis,” Interchange , vol. 54, no. 2, p. 125, Apr. 2023, doi: 10.1007/s10780-023-09493-9.
[3] K. Fahd and S. J. Miah, “Designing and Evaluating a Big Data Analytics Approach for predicting students’ success factors,” Research Square (Research Square) , Sep. 2022, doi: 10.21203/rs.3.rs-2075479/v1.
[4] R. H. Albornoz-Toyohama, R. Mendigure-Hachircana, S. Haro-Casildo, and C. D. Abanto-Ramírez, “Factors influencing student retention university students: an analysis of institutional reputation, social responsibility and loyalty,” Frontiers in Education , vol. 10, Mar. 2025, doi: 10.3389/feduc.2025.1499518.
[5] R. Parvez, A. Tarantino, and S. I. A. Meerza, “Understanding the Prediction of Student Retention Behavior during covid-19 Using Effective Data Mining Techniques,” Research Square (Research Square) , May 2023, doi: 10.21203/rs.3.rs-2948727/v1.
[6] S. Gaftandzhıeva, R. Doneva, and M. Bliznakov, “Automated Statistical Analysis for Improving HEIs Training Performance,” Proceedings of the Bulgarian Academy of Sciences , vol. 77, no. 1, Jan. 2024, doi: 10.7546/crabs.2024.01.10.
[7] D. Andrade-Girón et al. , “Predicting Student Dropout based on Machine Learning and Deep Learning: A Systematic Review,” ICST Transactions on Scalable Information Systems . European Alliance for Innovation, Jul. 18, 2023. doi: 10.4108/eetsis.3586.
[8] L. Flueckiger, R. Lieb, A. H. Meyer, and J. Mata, “How Health Behaviors Relate to Academic Performance via Affect: An Intensive Longitudinal Study,” PLoS ONE , vol. 9, no. 10, Oct. 2014, doi: 10.1371/journal.pone.0111080.
[9] E. Sosu and P. Pheunpha, “Trajectory of University Dropout: Investigating the Cumulative Effect of Academic Vulnerability and Proximity to Family Support,” Frontiers in Education , vol. 4, Feb. 2019, doi: 10.3389/feduc.2019.00006.
[10] S. Kim, E. Yoo, and S. Kim, “Why Do Students Drop Out? University Dropout Prediction and Associated Factor Analysis Using Machine Learning Techniques,” arXiv (Cornell University) , Oct. 2023, doi: 10.48550/arxiv.2310.10987.
[11] A. González-Nucamendi, J. Noguez, L. Neri, V. Robledo-Rella, and R. M. G. García-Castelán, “Predictive analytics study to determine undergraduate students at risk of dropout,” Frontiers in Education , vol. 8, Oct. 2023, doi: 10.3389/feduc.2023.1244686.
[12] A. Ridwan and A. M. Priyatno, “Predict Students’ Dropout and Academic Success with XGBoost,” Journal of Education and Computer Applications , vol. 1, no. 2, p. 1, Dec. 2024, doi: 10.69693/jeca.v1i2.13.
[13] A. Villar and C. R. V. de Andrade, “Supervised machine learning algorithms for predicting student dropout and academic success: a comparative study,” Discover Artificial Intelligence , vol. 4, no. 1, Jan. 2024, doi: 10.1007/s44163-023-00079-z.
[14] C. L. Rodríguez, E. G. Villena, J. B. Ballester, F. Á. D. Prados, E. S. Alvarado, and J. Crespo, “Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data,” International Journal of Emerging Technologies in Learning (iJET) , vol. 18, no. 4, p. 120, Feb. 2023, doi: 10.3991/ijet.v18i04.34825.
[15] J. J. Alcolea, Á. Ortigosa, R. M. Carro, and O. J. Blanco, “Best Practices in Dropout Prediction,” in Advances in educational technologies and instructional design book series , IGI Global, 2020, p. 301. doi: 10.4018/978-1-7998-5074-8.ch015.
[16] T. R. Noviandy et al. , “Machine Learning for Early Detection of Dropout Risks and Academic Excellence: A Stacked Classifier Approach,” Journal of Educational Management and Learning , vol. 2, no. 1, p. 28, May 2024, doi: 10.60084/jeml.v2i1.191.
[17] J. A. Talamás-Carvajal and H. G. Ceballos, “A stacking ensemble machine learning method for early identification of students at risk of dropout,” Education and Information Technologies , vol. 28, no. 9, p. 12169, Mar. 2023, doi: 10.1007/s10639-023-11682-z.
[18] H. Herianto, “Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction,” Journal of Applied Data Sciences , vol. 5, no. 3, p. 1272, Sep. 2024, doi: 10.47738/jads.v5i3.316.
[19] E. Evangelista and B. D. Sy, “An approach for improved students’ performance prediction using homogeneous and heterogeneous ensemble methods,” International Journal of Power Electronics and Drive Systems/International Journal of Electrical and Computer Engineering , vol. 12, no. 5, p. 5226, Aug. 2022, doi: 10.11591/ijece.v12i5.pp5226-5235.
[20] S. Malik and K. Jothimani, “Arbitrator Miniature: A Paradigm using Data Science Methods to Predict Academic Performance,” Research Square (Research Square) , Nov. 2022, doi: 10.21203/rs.3.rs-2271636/v1.
[21] M. A. Memon, Y. Lu, S. Yu, A. Memon, and A. R. Memon, “The Critical Feature Selection Approach using Ensemble Meta-Based Models to Predict Academic Performances,” The International Arab Journal of Information Technology , vol. 19, Jan. 2022, doi: 10.34028/iajit/19/3a/12.
[22] S. Verma, R. K. Yadav, and K. Kholiya, “A Scalable Machine Learning-based Ensemble Approach to Enhance the Prediction Accuracy for Identifying Students at-Risk,” International Journal of Advanced Computer Science and Applications , vol. 13, no. 8, Jan. 2022, doi: 10.14569/ijacsa.2022.0130822.
[23] A. Jain, A. K. Dubey, S. Khan, A. Panwar, M. Alkhatib, and A. Alshahrani, “A PSO weighted ensemble framework with SMOTE balancing for student dropout prediction in smart education systems,” Scientific Reports , vol. 15, no. 1, p. 17463, May 2025, doi: 10.1038/s41598-025-97506-1.
[24] N. Hutagaol and S. Suharjito, “Predictive Modelling of Student Dropout Using Ensemble Classifier Method in Higher Education,” Advances in Science Technology and Engineering Systems Journal , vol. 4, no. 4, p. 206, Jan. 2019, doi: 10.25046/aj040425.
[25] J. Berens, K. Schneider, S. Görtz, S. Oster, and J. Burghoff, “Early Detection of Students at Risk – Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods,” SSRN Electronic Journal , Jan. 2018, doi: 10.2139/ssrn.3275433.
[26] “Early Prediction of University Student Dropout Using Machine Learning Models,” Nanotechnology Perceptions , vol. 20, Jun. 2024, doi: 10.62441/nano-ntp.v20is5.62.
[27] N. O. Vasconcelos, M. C. Júnior, T. S. Almeida, and V. M. da Silva, “Comparative Analysis of Data Mining Algorithms Applied to the Context of School Dropout,” in Annals of Computer Science and Information Systems , Polskie Towarzystwo Informatyczne, Sep. 2019, p. 3. doi: 10.15439/2019f265.
[28] S. S. A. Tarmizi, S. Mutalib, N. H. A. Hamid, and S. Abdul-Rahman, “A Review on Student Attrition in Higher Education Using Big Data Analytics and Data Mining Techniques,” International Journal of Modern Education and Computer Science , vol. 11, no. 8. p. 1, Aug. 08, 2019. doi: 10.5815/ijmecs.2019.08.01.
[29] S. Mustofa, Y. R. Emon, S. B. Mamun, S. A. Akhy, and M. T. Ahad, “A novel AI-driven model for student dropout risk analysis with explainable AI insights,” Computers and Education Artificial Intelligence , vol. 8, p. 100352, Dec. 2024, doi: 10.1016/j.caeai.2024.100352.
[30] K. Oqaidi, S. Aouhassi, and K. Mansouri, “Towards a Students’ Dropout Prediction Model in Higher Education Institutions Using Machine Learning Algorithms,” International Journal of Emerging Technologies in Learning (iJET) , vol. 17, no. 18, p. 103, Sep. 2022, doi: 10.3991/ijet.v17i18.25567.
[31] M. S. Marcolino et al. , “Student dropout prediction through machine learning optimization: insights from moodle log data,” Scientific Reports , vol. 15, no. 1, Mar. 2025, doi: 10.1038/s41598-025-93918-1.
[32] N. Mduma, “Data driven approach for predicting student dropout in secondary schools,” 2020. doi: 10.58694/20.500.12479/898.
[33] M. Psyridou, F. Prezja, M. Torppa, M. Lerkkanen, A. Poikkeus, and K. Vasalampi, “Machine Learning Predicts Upper Secondary Education Dropout as Early as the End of Primary School,” arXiv (Cornell University) , Mar. 2024, doi: 10.48550/arxiv.2403.14663.
[34] Z. Tang, A. Jain, and F. E. Colina, “A Comparative Study of Machine Learning Techniques for College Student Success Prediction,” Journal of Higher Education Theory and Practice , vol. 24, no. 1, Jan. 2024, doi: 10.33423/jhetp.v24i1.6764.
[35] A. Safarzadeh, M. R. Jamali, and B. Moshiri, “Enhancing Precision of Automated Teller Machines Network Quality Assessment: Machine Learning and Multi Classifier Fusion Approaches,” arXiv (Cornell University) , Jan. 2025, doi: 10.48550/arxiv.2501.01067.
[36] Y. ELYUSUFI and M. A. Kbir, “Churn Prediction Analysis by Combining Machine Learning Algorithms and Best Features Exploration,” International Journal of Advanced Computer Science and Applications , vol. 13, no. 7, Jan. 2022, doi: 10.14569/ijacsa.2022.0130773.
[37] N. Mouchantaf and M. Chamoun, “Predicting Student Dropout with Minimal Information,” Iraqi Journal of Science , p. 5265, Oct. 2023, doi: 10.24996/ijs.2023.64.10.33.
[38] I. Nirmala, H. Wijayanto, and K. A. Notodiputro, “Prediction of Undergraduate Student’s Study Completion Status Using MissForest Imputation in Random Forest and XGBoost Models,” ComTech Computer Mathematics and Engineering Applications , vol. 13, no. 1, p. 53, Feb. 2022, doi: 10.21512/comtech.v13i1.7388.
[39] M. Zerkouk, M. Mihoubi, B. Chikhaoui, and S. Wang, “A machine learning based model for student’s dropout prediction in online training,” Education and Information Technologies , Feb. 2024, doi: 10.1007/s10639-024-12500-w.
[40] S. Sharma and H. Pathak, “Explainable Artificial Intelligence Credit Risk Assessment using Machine Learning,” arXiv (Cornell University) , Jun. 2025, doi: 10.48550/arxiv.2506.19383.
[41] A. B. Zemariam et al. , “Employing supervised machine learning algorithms for classification and prediction of anemia among youth girls in Ethiopia,” Scientific Reports , vol. 14, no. 1, Apr. 2024, doi: 10.1038/s41598-024-60027-4.
[42] D. Glandorf, H. R. Lee, G. A. Orona, M. Pumptow, R. Yu, and C. Fischer, “Temporal and Between-Group Variability in College Dropout Prediction,” arXiv , 2024, doi: 10.48550/ARXIV.2401.06498.
[43] J. Niyogisubizo, L. Liao, E. Nziyumva, E. Murwanashyaka, and P. C. Nshimyumukiza, “Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization,” Computers and Education Artificial Intelligence , vol. 3, p. 100066, Jan. 2022, doi: 10.1016/j.caeai.2022.100066.
[44] J. M. de F. Porras, J. A. Lara, C. Romero, and S. Ventura, “A Case-Study Comparison of Machine Learning Approaches for Predicting Student’s Dropout from Multiple Online Educational Entities,” Algorithms , vol. 16, no. 12, p. 554, Dec. 2023, doi: 10.3390/a16120554.
[45] D. Thammasiri, D. Delen, P. Meesad, and N. Kasap, “A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition,” Expert Systems with Applications , vol. 41, no. 2, p. 321, Aug. 2013, doi: 10.1016/j.eswa.2013.07.046.
[46] Z. Tang, L. Y. Chen, and A. Jain, “Exploring Individual Feature Importance in Student Persistence Prediction,” Journal of Higher Education Theory and Practice , vol. 23, no. 6, Apr. 2023, doi: 10.33423/jhetp.v23i6.5957.
[47] T.-J. Law, C. Ting, H. Ng, H.-N. Goh, and A. Quek, “Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection,” Journal of Informatics and Web Engineering , vol. 3, no. 2, p. 229, Jun. 2024, doi: 10.33093/jiwe.2024.3.2.17.
[48] V. Flores, S. Heras, and V. Julián, “Comparison of Predictive Models with Balanced Classes Using the SMOTE Method for the Forecast of Student Dropout in Higher Education,” Electronics , vol. 11, no. 3, p. 457, Feb. 2022, doi: 10.3390/electronics11030457.
[49] M. M. Hammad, “Artificial Neural Network and Deep Learning: Fundamentals and Theory,” arXiv (Cornell University) , Aug. 2024, doi: 10.48550/arxiv.2408.16002.
[50] J. Buchhorn and B. U. Wigger, “Predicting Student Dropout: A Replication Study Based on Neural Networks,” SSRN Electronic Journal , Jan. 2021, doi: 10.2139/ssrn.3929194.
[51] I. Pratama, P. T. Prasetyaningrum, A. Y. Chandra, and O. Suria, “Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance,” Register Jurnal Ilmiah Teknologi Sistem Informasi , vol. 10, no. 1, p. 1, Feb. 2024, doi: 10.26594/register.v10i1.3397.
[52] G. Douzas, F. Bação, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Information Sciences , vol. 465, p. 1, Jun. 2018, doi: 10.1016/j.ins.2018.06.056.
[53] Y. Yang, H. A. Khorshidi, and U. Aickelin, “A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems,” Frontiers in Digital Health , vol. 6. Frontiers Media, p. 1430245, Jul. 26, 2024. doi: 10.3389/fdgth.2024.1430245.
[54] K. C. Veigas, D. S. Regulagadda, and S. A. Kokatnoor, “Optimized Stacking Ensemble (OSE) for Credit Card Fraud Detection using Synthetic Minority Oversampling Model,” Indian Journal of Science and Technology , vol. 14, no. 32, p. 2607, Aug. 2021, doi: 10.17485/ijst/v14i32.807.
[55] M. Bayer, M. Kaufhold, and C. Reuter, “A Survey on Data Augmentation for Text Classification,” ACM Computing Surveys , vol. 55, no. 7. Association for Computing Machinery, p. 1, Jun. 17, 2022. doi: 10.1145/3544558.
[56] C. J. Arizmendi et al. , “Predicting student outcomes using digital logs of learning behaviors: Review, current standards, and suggestions for future work,” Behavior Research Methods , vol. 55, no. 6. Springer Science+Business Media, p. 3026, Aug. 26, 2022. doi: 10.3758/s13428-022-01939-9.
[57] G. Moncef, B. Ayah, and T. Olivier, “XStacking: Explanation-Guided Stacked Ensemble Learning,” arXiv (Cornell University) , Jul. 2025, doi: 10.48550/arxiv.2507.17650.
[58] M. Neshat, M. Phipps, N. Jha, D. N. Khojasteh, M. T. Tong, and A. Gandomi, “Effective Predictive Modeling for Emergency Department Visits and Evaluating Exogenous Variables Impact: Using Explainable Meta-learning Gradient Boosting,” arXiv (Cornell University) , Nov. 2024, doi: 10.48550/arxiv.2411.11275.
[59] M. P. Sesmero, A. Ledezma, and A. Sanchis, “Generating ensembles of heterogeneous classifiers using Stacked Generalization,” Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery , vol. 5, no. 1, p. 21, Jan. 2015, doi: 10.1002/widm.1143.
[60] F. Yu and X. Liu, “Research on Student Performance Prediction Based on Stacking Fusion Model,” Electronics , vol. 11, no. 19, p. 3166, Oct. 2022, doi: 10.3390/electronics11193166.
[61] O. B. Musiliu, “Using ensemble random forest, boosting and base classifiers to ameliorate prediction of students’ academic performance,” International journal of advance research, ideas and innovations in technology , vol. 6, no. 5, p. 654, Oct. 2020, Accessed: Oct. 2025. [Online]. Available: https://www.ijariit.com/manuscripts/v6i5/V6I5-1348.pdf
[62] E. W. Fox, R. A. Hill, S. G. Leibowitz, A. R. Olsen, D. J. Thornbrugh, and M. H. Weber, “Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology,” Environmental Monitoring and Assessment , vol. 189, no. 7, p. 316, Jun. 2017, doi: 10.1007/s10661-017-6025-0.
[63] M. Shahhosseini and G. Hu, “Improved Weighted Random Forest for Classification Problems,” arXiv , 2020, doi: 10.48550/ARXIV.2009.00534.
[64] T. Hamim, F. Benabbou, and N. Sael, “Student Profile Modeling Using Boosting Algorithms,” International Journal of Web-Based Learning and Teaching Technologies , vol. 17, no. 5, p. 1, Aug. 2021, doi: 10.4018/ijwltt.20220901.oa4.
[65] E.-Y. Seo, J. Yang, J.-E. Lee, and G. So, “Predictive modelling of student dropout risk: Practical insights from a South Korean distance university,” Heliyon , vol. 10, no. 11, May 2024, doi: 10.1016/j.heliyon.2024.e30960.
[66] C. Wang, L. Chang, and T. Liu, “Predicting Student Performance in Online Learning Using a Highly Efficient Gradient Boosting Decision Tree,” in IFIP advances in information and communication technology , Springer Science+Business Media, 2022, p. 508. doi: 10.1007/978-3-031-03948-5_41.
[67] U. Mandal, A. Chakarborty, P. Mahato, and G. Das, “LinVec: A Stacked Ensemble Machine Learning Architecture for Analysis and Forecasting of Time-Series Data,” Indian Journal of Science and Technology , vol. 16, no. 8, p. 570, Feb. 2023, doi: 10.17485/ijst/v16i8.2197.
[68] D. D. Ndunguru, X. Fan, and C. Z. Oroni, “Hybrid methods for student dropout prediction: feature selection and model fusion,” Sep. 2025, doi: 10.1007/s42454-025-00067-x.
[69] Y. Wang, “Artificial intelligence in student management systems to enhance academic performance monitoring and intervention,” Scientific Reports , vol. 15, no. 1, Oct. 2025, doi: 10.1038/s41598-025-19159-4.
[70] S. A. Shams, A. H. Omar, A. S. Desuky, M. T. Abou-Kreisha, and G. A. Elsharawy, “Even-odd crossover: a new crossover operator for improving the accuracy of students’ performance prediction,” Bulletin of Electrical Engineering and Informatics , vol. 11, no. 4, p. 2292, Jul. 2022, doi: 10.11591/eei.v11i4.3841.
[71] E. Aguiar, G. Ambrose, N. V. Chawla, V. Goodrich, and J. Brockman, “Engagement vs Performance: Using Electronic Portfolios to Predict First Semester Engineering Student Persistence,” Journal of Learning Analytics , vol. 1, no. 3, p. 7, Nov. 2014, doi: 10.18608/jla.2014.13.3.
[72] D. Hendrycks, “Introduction to AI Safety, Ethics, and Society,” arXiv (Cornell University) , Nov. 2024, doi: 10.48550/arxiv.2411.01042.
[73] S. M. Jayaprakash, E. W. Moody, E. J. M. Lauría, J. R. Regan, and J. D. Baron, “Early Alert of Academically At-Risk Students: An Open Source Analytics Initiative,” Journal of Learning Analytics , vol. 1, no. 1, p. 6, May 2014, doi: 10.18608/jla.2014.11.3.
[74] G. Akçapınar, A. Altun, and P. Aşkar, “Using learning analytics to develop early-warning system for at-risk students,” International Journal of Educational Technology in Higher Education , vol. 16, no. 1, Oct. 2019, doi: 10.1186/s41239-019-0172-z.
[75] M. Orooji and J. Chen, “Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques,” in 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) , Dec. 2019, p. 456. doi: 10.1109/icmla.2019.00085.
[76] L. Bognár, T. Fauszt, and G. Z. Nagy, “Analysis of Conditions for Reliable Predictions by Moodle Machine Learning Models,” International Journal of Emerging Technologies in Learning (iJET) , vol. 16, no. 6, p. 106, Mar. 2021, doi: 10.3991/ijet.v16i06.18347.
[77] M. Li and Suyawen, “Revolutionizing Education: Cutting-Edge Predictive Models for Student Success,” International Journal of Advanced Computer Science and Applications , vol. 15, no. 3, Jan. 2024, doi: 10.14569/ijacsa.2024.0150341.
[78] J. Terven, D. Córdova‐Esparza, J.-A. Romero-González, A. Ramírez-Pedraza, and E. A. Chávez‐Urbiola, “A comprehensive survey of loss functions and metrics in deep learning,” Artificial Intelligence Review , vol. 58, no. 7, Apr. 2025, doi: 10.1007/s10462-025-11198-7.


