High-Recall URL Phishing Detection via Multilayer Perceptron: Feature Selection, Learning Curves, and Confusion-Matrix Verification

Authors

  • Yoga Rizki Rahmawan Universitas Informatika dan Bisnis Indonesia image/svg+xml Author
    • Hadi Nurjaman Universitas Informatika dan Bisnis Indonesia image/svg+xml Author
      • Febri Faturahman Ramadhan Universitas Informatika dan Bisnis Indonesia image/svg+xml Author

        DOI:

        https://doi.org/10.65780/bima.v1i2.9

        Keywords:

        Phishing Detection, URL-Based Classification, Multilayer Perceptron, Machine Learning, Feature Selection, Cybersecurity

        Abstract

        Phishing attacks that exploit malicious URLs remain a significant and growing threat in the modern digital ecosystem due to their low operational costs, high scalability, and effectiveness in deceiving users. As more and more online services support important activities such as banking, e-commerce, government, and education, the need for fast, accurate, and lightweight phishing detection mechanisms is becoming increasingly urgent. This study proposes an end-to-end URL-based phishing detection framework that emphasizes reproducibility, robustness, and operational feasibility, with a particular focus on the Multilayer Perceptron (MLP) classifier. Using the PhiUSIIL phishing URL dataset, this research evaluates the performance of MLP against nine widely used machine learning algorithms, including linear, probabilistic, tree-based, and ensemble models. The methodology integrates systematic data cleaning, hierarchical data partitioning, feature normalization, ANOVA-based feature selection, and class imbalance handling to ensure fair and consistent evaluation. Model performance is assessed using accuracy, precision, recall, and F1-score, complemented by learning curve analysis and confusion matrix verification to examine generalization stability and critical error patterns. Experimental results show that while most models achieve very high overall performance, the MLP classifier consistently demonstrates superior stability and detection capabilities, achieving accuracy (99.98%), precision (99.97%), recall (100%), and F1-score (99,98%) with zero false negatives in phishing classification. These findings confirm that lexical and structural URL features alone are sufficient for effective phishing detection and highlight MLP as a practical, efficient, and reliable model for application in large-scale, real-time cybersecurity environments.

        Downloads

        Download data is not yet available.

        Downloads

        Published

        2026-01-31

        How to Cite

        High-Recall URL Phishing Detection via Multilayer Perceptron: Feature Selection, Learning Curves, and Confusion-Matrix Verification. (2026). Bulletin of Intelligent Machines and Algorithms, 1(2), 52-59. https://doi.org/10.65780/bima.v1i2.9

        Similar Articles

        You may also start an advanced similarity search for this article.