Hyperband‑Optimized LightGBM and Ensemble Learning for Web Phishing Detection with SHAP‑Based Interpretability

Rizki Wahyudi(1),


(1) Universitas Amikom Purwokerto

Abstract


This study evaluates the performance of three tree boosting algorithms, Random Forest (RF), XGBoost (XGB), and LightGBM (LGBM), in detecting phishing websites using a phishing dataset based on HTML, URLs, and network features. Two hyperparameter optimization strategies were tested: Hyperband search (HalvingRandomSearchCV) and stacking ensemble combining all three models. The evaluation was conducted based on five main metrics: accuracy, precision, recall, F1-score, and AUC‑ROC. The results indicate that LightGBM tuned via Hyperband achieved the highest performance (accuracy 0.9724; AUC‑ROC 0.9702), followed by ensemble tuned (accuracy 0.9697; AUC‑ROC 0.9684). SHAP analysis was used to interpret the contribution of key features in predicting phishing websites. The AUC‑ROC difference of 0.0034 points from the XGBoost baseline (0.9668) confirms the effectiveness of Hyperband tuning and stacking ensembles for phishing detection


Keywords


Phishing Detection; Machine Learning; Hyperparameter Tuning; Stacking Ensemble; SHAP Interpretability

Full Text:

PDF

References


P. Singh, T. Hasija, and K. R. Ramkumar, “Integrated Machine Learning Approach to Phishing Detection: Comparing SVM, Random Forest, and XGBoost Models,” in 2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS), IEEE, Nov. 2024, pp. 739–744. doi: 10.1109/ICTACS62700.2024.10840493.

N. F. Almujahid, M. A. Haq, and M. Alshehri, “Comparative evaluation of machine learning algorithms for phishing site detection,” PeerJ Comput Sci, vol. 10, p. e2131, Jun. 2024, doi: 10.7717/peerj-cs.2131.

N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, “Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions,” IEEE Access, vol. 10, pp. 36429–36463, 2022, doi: 10.1109/ACCESS.2022.3151903.

M. Almousa, T. Zhang, A. Sarrafzadeh, and M. Anwar, “Phishing website detection: How effective are deep learning‐based models and hyperparameter optimization?,” SECURITY AND PRIVACY, vol. 5, no. 6, Nov. 2022, doi: 10.1002/spy2.256.

K. Ileri, “Comparative analysis of CatBoost, LightGBM, XGBoost, RF, and DT methods optimised with PSO to estimate the number of k-barriers for intrusion detection in wireless sensor networks,” International Journal of Machine Learning and Cybernetics, May 2025, doi: 10.1007/s13042-025-02654-5.

S. Demir and E. K. Sahin, “An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost,” Neural Comput Appl, vol. 35, no. 4, pp. 3173–3190, Feb. 2023, doi: 10.1007/s00521-022-07856-4.

T. Kavzoglu and A. Teke, “Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost),” Arab J Sci Eng, vol. 47, no. 6, pp. 7367–7385, Jun. 2022, doi: 10.1007/s13369-022-06560-8.

I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022, doi: 10.1109/ACCESS.2022.3207287.

V. Selvaraj and I. Vairavasundaram, “A Bayesian optimized machine learning approach for accurate state of charge estimation of lithium ion batteries used for electric vehicle application,” J Energy Storage, vol. 86, p. 111321, May 2024, doi: 10.1016/j.est.2024.111321.

N. Subaşı, “Comprehensive Analysis of Grid and Randomized Search on Dataset Performance,” European Journal of Engineering and Applied Sciences, vol. 7, no. 2, pp. 77–83, Dec. 2024, doi: 10.55581/ejeas.1581494.

B. K. Dedeturk and B. Akay, “A parallel hybrid approach integrating clonal selection with artificial bee colony for logistic regression in spam email detection,” Neural Comput Appl, Dec. 2024, doi: 10.1007/s00521-024-10505-7.

J. Wilson, S. Chaudhury, and B. Lall, “Successive Halving Based Online Ensemble Selection for Concept-Drift Adaptation,” IEEE Transactions on Artificial Intelligence, pp. 1–15, 2025, doi: 10.1109/TAI.2025.3578305.

F. Hutter, L. Kotthoff, and J. Vanschoren, Eds., Automated Machine Learning. Cham: Springer International Publishing, 2019. doi: 10.1007/978-3-030-05318-5.

Arunraju Chinnaraju, “Explainable AI (XAI) for trustworthy and transparent decision-making: A theoretical framework for AI interpretability,” World Journal of Advanced Engineering Technology and Sciences, vol. 14, no. 3, pp. 170–207, Mar. 2025, doi: 10.30574/wjaets.2025.14.3.0106.

J. R, “Transparency in AI Decision Making: A Survey of Explainable AI Methods and Applications,” Advances in Robotic Technology, vol. 2, no. 1, pp. 1–10, Jan. 2024, doi: 10.23880/art-16000110.

L. Merrick and A. Taly, “The Explanation Game: Explaining Machine Learning Models Using Shapley Values,” 2020, pp. 17–38. doi: 10.1007/978-3-030-57321-8_2.

Z. Li, “GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models,” Ann Am Assoc Geogr, vol. 114, no. 7, pp. 1365–1385, Aug. 2024, doi: 10.1080/24694452.2024.2350982.

M. Li, H. Sun, Y. Huang, and H. Chen, “Shapley value: from cooperative game to explainable artificial intelligence,” Autonomous Intelligent Systems, vol. 4, no. 1, p. 2, Feb. 2024, doi: 10.1007/s43684-023-00060-8.

F. Yahya et al., “Detection of Phising Websites using Machine Learning Approaches,” in 2021 International Conference on Data Science and Its Applications (ICoDSA), IEEE, Oct. 2021, pp. 40–47. doi: 10.1109/ICoDSA53588.2021.9617482.

K. Barik, S. Misra, and R. Mohan, “Web-based phishing URL detection model using deep learning optimization techniques,” Int J Data Sci Anal, Feb. 2025, doi: 10.1007/s41060-025-00728-9.

K. Kanathey, VishwaGupta, and F. Imam, “An Enhanced and Optimized Stacking Ensemble Framework for Phishing URLs Detection,” in 2025 4th OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 5.0, IEEE, Apr. 2025, pp. 1–6. doi: 10.1109/OTCON65728.2025.11070371.

M. Adnan, M. O. Imam, M. F. Javed, and I. Murtza, “Improving spam email classification accuracy using ensemble techniques: a stacking approach,” Int J Inf Secur, vol. 23, no. 1, pp. 505–517, Feb. 2024, doi: 10.1007/s10207-023-00756-1.


Refbacks

  • There are currently no refbacks.


Journal of Computer Science and Engineering (JCSE)
ISSN 2721-0251 (online)
Published by : ICSE (Institute of Computer Sciences and Engineering)
Website : http://icsejournal.com/index.php/JCSE/
Email: jcse@icsejournal.com

Creative Commons License is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.