High-Recall URL Phishing Detection via Multilayer Perceptron: Feature Selection, Learning Curves, and Confusion-Matrix Verification
DOI:
https://doi.org/10.65780/bima.v1i2.9Keywords:
Phishing Detection, URL-Based Classification, Multilayer Perceptron, Machine Learning, Feature Selection, CybersecurityAbstract
Phishing attacks that exploit malicious URLs remain a significant and growing threat in the modern digital ecosystem due to their low operational costs, high scalability, and effectiveness in deceiving users. As more and more online services support important activities such as banking, e-commerce, government, and education, the need for fast, accurate, and lightweight phishing detection mechanisms is becoming increasingly urgent. This study proposes an end-to-end URL-based phishing detection framework that emphasizes reproducibility, robustness, and operational feasibility, with a particular focus on the Multilayer Perceptron (MLP) classifier. Using the PhiUSIIL phishing URL dataset, this research evaluates the performance of MLP against nine widely used machine learning algorithms, including linear, probabilistic, tree-based, and ensemble models. The methodology integrates systematic data cleaning, hierarchical data partitioning, feature normalization, ANOVA-based feature selection, and class imbalance handling to ensure fair and consistent evaluation. Model performance is assessed using accuracy, precision, recall, and F1-score, complemented by learning curve analysis and confusion matrix verification to examine generalization stability and critical error patterns. Experimental results show that while most models achieve very high overall performance, the MLP classifier consistently demonstrates superior stability and detection capabilities, achieving accuracy (99.98%), precision (99.97%), recall (100%), and F1-score (99,98%) with zero false negatives in phishing classification. These findings confirm that lexical and structural URL features alone are sufficient for effective phishing detection and highlight MLP as a practical, efficient, and reliable model for application in large-scale, real-time cybersecurity environments.














