1 Introduction
-
Effectiveness—With the best parameter setting of differential evolution (DE) and extremely limited conditions, the attack can achieve 72.29%, 72.30%, and 61.28% success rates of conducting non-targeted attacks on three types of common convolutional neural network structures: network in network [11], all convolutional network [21], and VGG16 [20] trained on CIFAR-10 dataset (Fig. 1). Further results on ImageNet dataset show that in non-targeted attacking, the BVLC AlexNet model can alter the labels of 31.87% of the validation images.×
-
Black-box attack —The proposed attack only needs miracle reaction (probability labels) from the target CNN system while many previous attacks require access to the inner information such as gradients, network structures, and training data, which in most cases is hard or even not available in practice. The capability of being able to conduct black-box attack using DE is based on the fact that it makes no assumption on the optimization problem of finding effective perturbation such that does not abstract the problem to any explicit target functions according to the assumption, but works directly on increasing(decreasing) the probability label values of the target (true) classes.
-
Efficiency—Many previous attacks of creating adversarial perturbation require alternation on a considerable amount of pixels such that it may risk the possibility being perceptible to human recognition systems as well as require higher cost of conducting the modification (i.e., the more pixels that need to be modified, the higher the cost). The proposed attack only requires modification on 5 pixels with an average distortion of 19.23 pixel value per channel per pixel for CIFAR-10 images. Specifically, the modification on 5 pixels is further pressured by adding a term that is proportional to the strength of accumulated modification in the fitness functions of DEs.
-
Scalability—Being able to attack more types of CNNs (e.g., networks that are not differentiable or when the gradient calculation is difficult) as long as the feedback of the target systems is available.
2 Related works
3 Methodology
3.1 Problem description
3.2 Perturbation strength
3.3 Differential evolution and its variants
Variant | Success rate (%) | Confidence (%) | Cost |
---|---|---|---|
0.5/0.5/0.5 | 71.46 | 89.38 | 24.66 |
0.9/0.5/0.5 | 72.00 | 88.22 | 25.71 |
0.1/0.5/0.5 | 70.63 | 90.86 | 20.32 |
Variant | Success rate (%) | Confidence (%) | Cost |
---|---|---|---|
0.5/0.5/0.5 | 71.46 | 89.38 | 24.66 |
0.5/0.5/0.9 | 71.66 | 88.60 | 24.43 |
0.5/0.5/0.1 | 71.05 | 89.71 | 24.60 |
0.5/0.9/0.9 | 72.06 | 90.19 | 25.03 |
0.5/0.9/0.5 | 70.86 | 89.58 | 24.69 |
0.5/0.9/0.1 | 72.06 | 88.70 | 24.16 |
0.5/0.1/0.9 | 71.04 | 88.98 | 24.68 |
0.5/0.1/0.1 | 72.29 | 88.68 | 24.64 |
0.5/0.1/0.5 | 72.00 | 88.98 | 24.86 |
3.3.1 Mutation
3.3.2 Crossover
-
Crossover on position information. The crossover only replaces the position information (i.e., the first two dimensions) of xi∗ with the one owned by xi. A probability value Cp is used to identify if the crossover triggers or not. Exchanging information of coordinate is for letting the offspring inherit the location information of vulnerable pixels containing in current population.
-
Crossover on RGB values. The crossover only replaces the RGB value information (i.e., the last three dimensions) of xi∗ with the one owned by xi. A probability value Crgb is used to identify if the crossover triggers or not. Exchanging information of coordinate is for letting the offspring inherits the information of vulnerable RGB perturbation values containing in current population.
-
Crossover for both position and RGB values. Such a crossover is the combination of the above two, according to the assumption that both crossovers are useful.
-
No crossover. The opposite to the one above, assuming that exchanging either information of pixel locations or RGB values is not meaningful.
3.3.3 Selection
3.3.4 Other DE variants
3.4 Using differential evolution for generating adversarial perturbation
-
Higher probability of finding global optima—DE is a meta-heuristic which is relatively less subject to local minima than gradient descent or greedy search algorithms (this is in part due to the diversity keeping mechanisms and the use of a set of candidate solutions). Capability of finding better solutions (e.g., global optima rather than local) is necessary in our case since we have implemented more restricted constraints on perturbation in this research such that the quality of optimization solution has to be guaranteed to a high extent.
-
Require less information from target system—DE does not require the optimization problem to be differentiable as is required by classical optimization methods such as gradient descent and quasi-Newton methods. This is critical in the case of generating adversarial images since (1) there are networks that are not differentiable, for instance [26] and (2) calculating gradient requires much more information about the target system which can be hardly realistic in many cases.
-
Simplicity—The approach proposed here is independent of the classifier used. For the attack to take place, it is sufficient to know the probability labels. In addition, most of previous works abstract the problem of searching the effective perturbation to a specific optimization problem (e.g., an explicit target function with constraints). Namely, additional assumptions are made to the searching problem, and this might induce additional complexity. Using DE does not solve any explicit target functions but directly works with the probability label value of the target classes.
3.5 Method and settings
3.6 Finding the best variant
4 Evaluation and results
-
Success rate—It is defined as the empirical probability of a natural image that can be successfully altered to another pre-defined (targeted attack) and arbitrary class (non-targeted attack) by adding the perturbation.
-
Confidence—The measure indicates the average probability label of the target class output from the target system when successfully altered the label of the image from true to target.
-
Average distortion—The average distortion on the single pixel attacked by taking the average modification on the three color channels is used for evaluating the cost of attack. Specifically, the cost is high if the value of average distortion is high such that it is more likely to be perceptible to the human eyes.
4.1 Comparison of DE variants and further experiments
4.2 Results
Variant | Success rate (%) | Confidence (%) | Cost |
---|---|---|---|
All convolutional net | |||
0.1/0.1/0.1 | 71.86 | 90.30 | 20.44 |
0.5/0.1/0.1 | 72.29 | 88.68 | 24.64 |
Network in network | |||
0.1/0.1/0.1 | 72.30 | 83.63 | 14.28 |
0.5/0.1/0.1 | 70.63 | 81.17 | 16.30 |
VGG network | |||
0.1/0.1/0.1 | 56.49 | 67.36 | 22.98 |
0.5/0.1/0.1 | 61.28 | 73.07 | 24.62 |
BVLC network | |||
0.1/0.1/0.1 | 31.87 | 14.88 | 2.36 |
0.5/0.1/0.1 | 26.69 | 14.79 | 6.19 |
4.2.1 Effectiveness and efficiency of attack
Method | Success rate (%) | Confidence (%) | Number (percentage) of pixels | Network |
---|---|---|---|---|
0.1/0.1/0.1 | 72.30 | 83.63 | 5 (0.48%) | NiN |
0.1/0.1/0.1 | 56.49 | 67.36 | 5 (0.48%) | VGG |
0.1/0.1/0.1 | 71.86 | 90.30 | 5 (0.48%) | AllConv |
LSA | 97.89 | 72 | 33 (3.24%) | NiN |
LSA | 97.98 | 77 | 30 (2.99%) | VGG |
FGSM | 93.67 | 93 | 1024 (100%) | NiN |
FGSM | 90.93 | 90 | 1024 (100%) | VGG |
One-pixel | 72.85 | 75.02 | 1 (0.098%) | NiN |
One-pixel | 63.53 | 65.25 | 1 (0.098%) | VGG |
One-pixel | 68.71 | 79.4 | 1 (0.098%) | AllConv |