Machine Learning Models Explanations and Adversarial Attacks

Maxim Egorov, Igor Zianchurin, Ilya Kuzmenko, Danil Tarasov, Dmitry Namiot

Abstract


The paper considers the practical construction of adversarial attacks (evasion attacks) on machine learning models using explanations of their operation. Despite the fact that machine learning models, in general, are a black box, there are schemes for constructing explanations (their approximations) that allow us to evaluate how exactly a decision is made. Even if our model is not a decision tree, we can obtain a similar explanation for decision making in the model. One example of such schemes is the use of SHAP values. Such an approach allows us to form attacks in black box mode. If the training dataset of the attacked model or even a part of it is known, the attacker can use it to train his model of arbitrary architecture. Then, explanations can be constructed for this model, which can be used to form an attack. Since adversarial attacks are portable, such attacks can be reproduced on the attacked model. The source code for such experiments is given in this paper. Network traffic classification models in the Internet of Things system are considered as an attackable example.


Full Text:

PDF (Russian)

References


Ilyushin, Eugene, Dmitry Namiot, and Ivan Chizhov. "Attacks on machine learning systems-common problems and methods." International Journal of Open Information Technologies 10.3 (2022): 17-22.

Namiot, Dmitry. "Schemes of attacks on machine learning models." International Journal of Open Information Technologies 11.5 (2023): 68-86.

Zhao, Zhengyu, et al. "Towards good practices in evaluating transfer adversarial attacks." arXiv preprint arXiv:2211.09565 (2022).

Navigate threats to AI systems through real-world insights https://atlas.mitre.org/ Retrieved: Jun, 2025

Slack, Dylan, et al. "Fooling lime and shap: Adversarial attacks on post hoc explanation methods." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 2020.

Fidel, Gil, Ron Bitton, and Asaf Shabtai. "When explainability meets adversarial learning: Detecting adversarial examples using shap signatures." 2020 international joint conference on neural networks (IJCNN). IEEE, 2020.

Hickling, Thomas, Nabil Aouf, and Phillippa Spencer. "Robust adversarial attacks detection based on explainable deep reinforcement learning for UAV guidance and planning." IEEE Transactions on Intelligent Vehicles 8.10 (2023): 4381-4394.

Aryal, Kshitiz, et al. "Explainability guided adversarial evasion attacks on malware detectors." 2024 33rd International Conference on Computer Communications and Networks (ICCCN). IEEE, 2024.

IoT Intrusion https://www.kaggle.com/datasets/subhajournal/iotintrusion/data Retrieved: Jun, 2025

CICIoT2023 models https://www.google.com/search?q=CICIoT2023 Retrieved: Jun, 2025

An introduction to explainable AI with Shapley values https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html Retrieved: Jun, 2025

Christoph Molnar Interpretable Machine Learning. A Guide for Making Black Box Models Explainable https://christophm.github.io/interpretable-ml-book/

Grini, Anass, et al. "Constrained Network Adversarial Attacks: Validity, Robustness, and Transferability." arXiv preprint arXiv:2505.01328 (2025).

GitHub 1, 2025. Available: https://github.com/lava-aaa/iot_hw

GitHub 2, 2025. Available: https://github.com/Dark-Avery/DDoS_classifier

Sukhomlin, Vladimir A. "The Concept and Main Characteristics of the Master's Degree Program" Cybersecurity" of the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University." International Journal of Open Information Technologies 11.7 (2023): 143-148.

Namiot, Dmitry, and Vladimir Sukhomlin. "On cybersecurity of the Internet of Things systems." International Journal of Open Information Technologies 11.2 (2023): 85-97.

Namiot, Dmitry, Eugene Ilyushin, and Ivan Chizhov. "Artificial intelligence and cybersecurity." International Journal of Open Information Technologies 10.9 (2022): 135-147.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162