High-Recall URL Phishing Detection via Multilayer Perceptron: Feature Selection, Learning Curves, and Confusion-Matrix Verification
DOI:
https://doi.org/10.65780/bima.v1i2.9Keywords:
Phishing Detection, URL-Based Classification, Multilayer Perceptron, Machine Learning, Feature Selection, CybersecurityAbstract
Phishing attacks that exploit malicious URLs remain a significant and growing threat in the modern digital ecosystem due to their low operational costs, high scalability, and effectiveness in deceiving users. As more and more online services support important activities such as banking, e-commerce, government, and education, the need for fast, accurate, and lightweight phishing detection mechanisms is becoming increasingly urgent. This study proposes an end-to-end URL-based phishing detection framework that emphasizes reproducibility, robustness, and operational feasibility, with a particular focus on the Multilayer Perceptron (MLP) classifier. Using the PhiUSIIL phishing URL dataset, this research evaluates the performance of MLP against nine widely used machine learning algorithms, including linear, probabilistic, tree-based, and ensemble models. The methodology integrates systematic data cleaning, hierarchical data partitioning, feature normalization, ANOVA-based feature selection, and class imbalance handling to ensure fair and consistent evaluation. Model performance is assessed using accuracy, precision, recall, and F1-score, complemented by learning curve analysis and confusion matrix verification to examine generalization stability and critical error patterns. Experimental results show that while most models achieve very high overall performance, the MLP classifier consistently demonstrates superior stability and detection capabilities, achieving accuracy (99.98%), precision (99.97%), recall (100%), and F1-score (99,98%) with zero false negatives in phishing classification. These findings confirm that lexical and structural URL features alone are sufficient for effective phishing detection and highlight MLP as a practical, efficient, and reliable model for application in large-scale, real-time cybersecurity environments.
Downloads
References
[1] I. Akpan Essien et al., “Neural Network-Based Phishing Attack Detection and Prevention Systems,” J. Front. Multidiscip. Res., doi: 10.54660/.JFMR.2021.2.2.222-238.
[2] K. Siber, Keamanan siber.
[3] R. Goenka, M. Chawla, and N. Tiwari, “A comprehensive survey of phishing: mediums, intended targets, attack and defence techniques and a novel taxonomy,” Int. J. Inf. Secur. 2023 232, vol. 23, no. 2, pp. 819–848, Oct. 2023, doi: 10.1007/S10207-023-00768-X.
[4] K. M. Lembaga et al., “PADA BANK SYARIAH IDONESIA KC,” 2023.
[5] N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, “Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions,” IEEE Access, vol. 10, pp. 36429–36463, 2022, doi: 10.1109/ACCESS.2022.3151903.
[6] S. M. C. Science, E. and, and undefined 2025, “Understanding Data Drift and Concept Drift in Machine Learning Systems,” quantbeckman.com, vol. 11, no. 1, p. 319, doi: 10.32628/CSEIT25111239.
[7] S. R. Abdul Samad et al., “Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection,” Electron. 2023, Vol. 12, Page 1642, vol. 12, no. 7, p. 1642, Mar. 2023, doi: 10.3390/ELECTRONICS12071642.
[8] A. U. Z. Asif, H. Shirazi, and I. Ray, “Machine Learning-Based Phishing Detection Using URL Features: A Comprehensive Review,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 14310 LNCS, pp. 481–497, 2023, doi: 10.1007/978-3-031-44274-2_36/TABLES/3.
[9] A. A. Albishri and M. M. Dessouky, “A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 6, pp. 18495–18501, Dec. 2024, doi: 10.48084/ETASR.8920.
[10] D. Sarma, T. Mittra, R. M. Bawm, T. Sarwar, F. F. Lima, and S. Hossain, “Comparative Analysis of Machine Learning Algorithms for Phishing Website Detection,” Lect. Notes Networks Syst., vol. 173 LNNS, pp. 883–896, 2021, doi: 10.1007/978-981-33-4305-4_64.
[11] H. Mah, N. H.-E. A. in Integrated, and undefined 2025, “Performance Comparison of Machine Learning Models for Phishing Website Detection based on Multilayer Perceptron,” Publ. Mah, NH HarunEmerging Adv. Integr. Technol. 2025•publisher.uthm.edu.my, Accessed: Jan. 24, 2026. [Online]. Available: https://publisher.uthm.edu.my/ojs/index.php/emait/article/view/14954
[12] A. Karim, M. Shahroz, K. Mustofa, S. B. Belhaouari, and S. R. K. Joga, “Phishing Detection System Through Hybrid Machine Learning Based on URL,” IEEE Access, vol. 11, pp. 36805–36822, 2023, doi: 10.1109/ACCESS.2023.3252366.
[13] A. M. Veach and M. Abualkibash, “Phishing Website Detection Using Several Machine Learning Algorithms: A Review Paper,” Int. J. Informatics, Inf. Syst. Comput. Eng., vol. 3, no. 2, pp. 219–230, Dec. 2022, doi: 10.34010/INJIISCOM.V3I2.8805.


