IML4E

Adversarial Test Toolbox

What is it?

The Adversarial Test Toolbox enables an in-depth assessment of adversarial robustness of the target models. It allows users to apply variety of adversarial algorithms to generate and test the effectiveness of powerful attacks in both white-box and black-box scenarios.

Given the threats to usability posed by the adversarial vulnerability of deep learning models, we use our recent research results on adversarial transferability to develop an automated tool to test models against transfer-based attacks. The toolbox supports various object detection models and a range of attack algorithms to create adversarial examples, which can then be applied on any selected target model to assess its adversarial robustness.

Why is it necessary?

DNNs are found to be vulnerable to data samples with deliberately added and often imperceptible perturbations. Benign images that are otherwise classified correctly by a network, when subjected to these perturbation vectors, can cause classifiers to misclassify the perturbed images at a high rate. These perturbed images or adversarial examples are a severe threat to the usability of DNNs in safety-critical domains as they can effectively fool a network into making wrong decisions inspired by an adversary. Moreover, adversarial examples are observed to be transferable. Examples generated on one classifier are found to be effective on other classifiers trained to perform the same task. This enables an adversary to mount a black-box attack on a target network with adversarial images crafted in another network. Thus, given the severe threats posed by adversarial vulnerability of deep learning models, it becomes relevant to access the robustness of models against these attacks.

Within the project, we conducted an in-depth analysis of properties that affect the transferability of adversarial examples under various scenarios, with some notable findings relating to the algorithms used to create adversarial examples and model-related properties such as model size and architecture. The goal is to use these findings towards helping users to assess model robustness. By allowing the users to perform rigorous testing of deployed models against both white-box (attacks created and applied on the same network) and black-box attacks (transfer-based attacks) on target models with different properties (than source network), the application is a step towards this goal.

How does it work?

A simple workflow of the Adversarial Test Toolbox is as shown in Figure 1. The application works on two modes. The “create” mode can be used to create adversarial examples on a selected base model. Users can also provide the adversarial attack algorithm along with a dataset of clean images. The application then uses Adversarial Robustness Toolbox (ART) to generate adversarial examples and archives them as logs.

Figure 1: Adversarial Test Toolbox works in two modes. The figure depicts the workflow on both modes. Fraunhofer FOKUS

The “transfer” mode then allows users to apply the created adversarial samples on a target model. The application in this case computes mAP (Mean Average Precision) on both clean and adversarial samples. The input configurations are provided through a configuration yaml file.

Figure 2 shows detections from yolo3 model on a clean sample compared to detections on the corresponding adversarial sample (also on yolo3). As can be seen, both adversarial and clean samples look identical but when provided as input to the model, the adversarial example results in incorrect detections (zebras detected as person).

Figure 2: Predictions on a clean sample from COCO dataset on Yolo3 model (left). Predictions on adversarial sample on Yolo3 (right). Adversarial examples were created using an attack algorithm called Project Gradient Descent (PGD) Fraunhofer FOKUS

References and further reading

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. arXiv:1312.6199. http://arxiv.org/abs/1312.6199

Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. arXiv:1412.6572. http://arxiv.org/abs/1412.6572

Nicolae, M.-I., Sinn, M., Tran, M. N., Buesser, B., Rawat, A., Wistuba, M., Zantedeschi, V., Baracaldo, N., Chen, B., Ludwig, H., Molloy, I. M., & Edwards, B. (2019). Adversarial robustness toolbox v1.0.0. arXiv:1807.01069. https://arxiv.org/abs/1807.01069

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2019). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. http://arxiv.org/abs/1706.06083

Shrestha, A., & Großmann, J. (2024). Properties that allow or prohibit transferability of adversarial attacks among quantized networks. https://doi.org/10.1145/3644032.3644453

IML4E – Adversarial Test Toolbox

Adversarial Test Toolbox

What is it?

Why is it necessary?

How does it work?

References and further reading