Minimal-Feature XSS Detection by SHAP and Importance-Driven Pruning
Abstract
Cross-Site Scripting (XSS) remains a high-impact web threat despite widespread defensive mechanisms. Building on our prior study, we ask: how few features are needed to preserve near-ceiling detection quality? Starting after correlation-based de-redundancy, we iteratively prune features using CatBoost feature importance and SHAP values, retraining at each step on the same stratified, SMOTENC-balanced splits. The process yields an ultra-compact classifier using four features only. On the fixed test set, the model attains Accuracy = 0.984728, MCC = 0.9687, ROC-AUC = 0.9932, AP ≈ 0.99, Precision (macro) = 0.986362, Recall (macro) = 0.982326, and F1 (macro) = 0.98421; the confusion counts are TN = 7023, FP = 25, FN = 159, TP = 4841. Training/validation losses converge smoothly, and SHAP beeswarm plots show that all four retained features contribute consistently across many instances, explaining the strong threshold-free metrics. These results demonstrate that accurate, interpretable, and deployment-ready XSS detection is achievable with a minimal feature budget.
Full Text:
PDFReferences
Nair, S. S. (2024). Securing against advanced cyber threats: A comprehensive guide to phishing, XSS, and SQL injection defense. Journal of Computer Science and Technology Studies, 6(1), 76–93.
Rodríguez, G. E., Torres, J. G., Flores, P., & Benavides, D. E. (2020). Cross-site scripting (XSS) attacks and mitigation: A survey. Computer Networks, 166, 106960.
Steffens, M., Rossow, C., Johns, M., & Stock, B. (2019a). Don’t Trust The Locals: Investigating the Prevalence of Persistent Client-Side Cross-Site Scripting in the Wild.
Wang, Q., Chen, J., Jiang, Z., Guo, R., Liu, X., Zhang, C., & Duan, H. (2024). Break the Wall from Bottom: Automated Discovery of Protocol-Level Evasion Vulnerabilities in Web Application Firewalls. 2024 IEEE Symposium on Security and Privacy (SP), 185–202. https://doi.org/10.1109/SP54263.2024.00129
Hannousse, A., Yahiouche, S., & Nait-Hamoud, M. C. (2024). Twenty-two years since revealing cross-site scripting attacks: A systematic mapping and a comprehensive survey. Computer Science Review, 52, 100634. https://doi.org/10.1016/j.cosrev.2024.100634
Caturano, F., Perrone, G., & Romano, S. P. (2021). Discovering reflected cross-site scripting vulnerabilities using a multiobjective reinforcement learning environment. Computers & Security, 103, 102204. https://doi.org/10.1016/j.cose.2021.102204
Kaur, J., Garg, U., & Bathla, G. (2023). Detection of cross-site scripting (XSS) attacks using machine learning techniques: A review. Artificial Intelligence Review, 56(11), 12725–12769. https://doi.org/10.1007/s10462-023-10433-3
Buyukkayhan, A. S., Gemicioglu, C., Lauinger, T., Oprea, A., Robertson, W., & Kirda, E. (2020). What’s in an Exploit? An Empirical Analysis of Reflected Server {XSS} Exploitation Techniques. 107–120.
Fang, Y., Li, Y., Liu, L., & Huang, C. (2018). DeepXSS: Cross site scripting detection based on deep learning. 47–51.
Guan, H., Li, D., Li, H., & Zhao, M. (2022). A Crawler-Based Vulnerability Detection Method for Cross-Site Scripting Attacks. 651–655.
Kumar, J. H., & Ponsam, J. G. (2023). Cross site scripting (XSS) Vulnerability detection using machine learning and statistical analysis. 1–9.
Mereani, F. A., & Howe, J. M. (2018). Detecting cross-site scripting attacks using machine learning. 200–210.
Mereani, F., & Howe, J. M. (2019). Exact and approximate rule extraction from neural networks with Boolean features. 1, 424–433.
Kascheev, S., & Olenchikova, T. (2020). The detecting cross-site scripting (XSS) using machine learning methods. 265–270.
Chen, H.-C., Nshimiyimana, A., Damarjati, C., & Chang, P.-H. (2021). Detection and prevention of cross-site scripting attack with combined approaches. 1–4.
Rodríguez-Galán, G., & Torres, J. (2024). Personal data filtering: A systematic literature review comparing the effectiveness of XSS attacks in web applications vs cookie stealing. Annals of Telecommunications, 79(11), 763–802.
Hajjouz, A., & Avksentieva, E. (2025). Highly Accurate XSS Detection using CatBoost. International Journal of Open Information Technologies, 13(6), 125–131.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность ИТ конгресс СНЭ
ISSN: 2307-8162