Introduction
-
Input filter before MLM: The primary purpose of input filters is to prevent adversarial input data in such a way that can differentiate data manipulation from the trained data. It will be examining the input by deploying application-specific filter sequence. A set of filter sequences are selected (from a given library of filters) using an efficient search and optimization algorithm, called multi-objective genetic algorithm (MOGA). The MOGA can find a sequence of filters (where each filter can detect adversarial traits/noises) satisfying constrains and three objectives: detection of the maximum number of attacks with higher accuracy (above a specific threshold), with minimum processing time, and shorter sequence of ensemble filters. By utilizing the Pareto-set from MOGA runs, and picking a filter sequence dynamically at different times, make filter selections unpredictable and use an active learning approach to protect the ML from adaptive attacks.
-
Output filter after MLM: Employ several class-specific latent space-based transformation for outlier detection. After MLM provides an output class label, it is then verified if the output falls in that class’s latent space or not. We will make an ensemble of different outlier detection methods and sequence dynamically and also retrain the outlier methods during runtime.
Preliminaries
Adversarial machine learning (AML) attacks
Adversarial defense
Defense technique | Approach/scheme |
---|---|
Adversarial training | Ensemble adversarial training, a training methodology that incorporates perturbed inputs transferred from other pre-trained models [86] |
Extended adversarial and virtual adversarial training as a means of regularizing a text classifier by stabilizing the classification function [58] | |
Training the state-of-the-art speech emotion recognition on the mixture of clean and adversarial examples to help regularization [21] | |
Defensive distillation | |
Pre-processing defense | Using PCA, low-pass filtering, JPEG compression, soft thresholding techniques as pre-processing technique to improve robustness [74] |
Use of use two randomisation operations: (1) random resizing of input images and (2) random padding with zeros around the input images [94] | |
Architecture alteration | Synonyms encoding method that inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations [90] |
An architecture using Bayesian classifiers (Gaussian processes with RBF kernels) to build more robust neural networks [12] | |
Network verification | A verification algorithm for DNNs with ReLU function was proposed in [43] verified the neural networks utilizing satisfiability modulo theory (SMT) solver |
The method in [43] was modified in \(\mathrm{max}(x,y) = \mathrm{ReLU}(x-y)+y\) and \(||x||=\mathrm{ReLu}(2x)-x\) to reduce the computational time | |
Ensembling countermeasures | The proposed strategy used an ensemble of classifiers with weighted/unweighted average of their prediction to increase robustness against attacks [76] |
A probabilistic ensemble framework against adversarial examples that capitalizes on intrinsic depth properties (e.g., probability divergence) of DNNs [1] | |
Adversarial detection | First, the features are squeezed either by decreasing each pixel’s color bit depth or smoothing the sample using a spatial filter. Then, a binary classifier that uses as features the predictions of a target model before and after squeezing of the input sample [95] |
A framework that utilizes ten nonintrusive image quality features to distinguish between legitimate and AA samples [4] | |
Multiversion programming based an audio AE detection approach, which utilizes multiple off-the-shelf automatic speech recognition systems to determine whether an audio input is an AE [97] |
Nature of adversarial attacks
Advanced AAs are ineffective in the physical environment
Filter family | Code | Filter name | Attack method | |||
---|---|---|---|---|---|---|
FGSM | BIM | PGD | JSMA | |||
ANALYTICAL | FT4 | Distance | 50 | 70 | 70 | 50 |
FT10 | Morph | 75 | 70 | 70 | 60 | |
EDGE Base | FT5 | Canny | 75 | 75 | 75 | 70 |
FT11 | Sobel | 50 | 75 | 50 | 75 | |
FT16 | Gaussian edge | 75 | 70 | 75 | 75 | |
Noise add | Median Blur | 65 | 60 | 65 | 55 | |
Average Blur | 70 | 70 | 70 | 70 | ||
FT1 | Gaussian Blur | 70 | 65 | 70 | 60 | |
FT7 | Gaussian Noise | 60 | 50 | 60 | 65 | |
Dilation | 70 | 70 | 70 | 75 | ||
Opening | 75 | 70 | 75 | 65 | ||
Closing | 70 | 50 | 70 | 75 | ||
SaltAndPepper | 75 | 75 | 75 | 75 | ||
FT13 | SierraDithering | 70 | 70 | 50 | 70 | |
Noise reduce | FT12 | Erosion | 75 | 55 | 75 | 70 |
FT0 | Sharpen | 70 | 70 | 75 | 50 | |
FT6 | Shrink | 50 | 50 | 55 | 55 | |
Texture | OilPainting | 75 | 50 | 75 | 70 | |
Pixellate | 50 | 70 | 50 | 50 | ||
FT14 | Wavelet | 70 | 75 | 70 | 50 | |
FT2 | Gabor | 50 | 70 | 50 | 50 | |
FT8 | Census | 55 | 55 | 55 | 70 | |
Transform | Top_Hat | 70 | 50 | 70 | 75 | |
BlackHat | 70 | 50 | 75 | 55 | ||
FT9 | Lapalce | 70 | 75 | 60 | 75 | |
FT3 | Fourier | 55 | 55 | 75 | 75 | |
Exponential | 50 | 50 | 50 | 75 | ||
FT15 | Log-polar | 50 | 55 | 75 | 65 | |
Mirror | 55 | 50 | 75 | 60 | ||
TopHat | 55 | 50 | 55 | 75 | ||
WaterWave | 75 | 50 | 75 | 70 |
Clean and adversarial inputs have identifiable noise difference
Same filtering technique will work for all ML model for a specific dataset
Different filters have different effectiveness to detect AAs
Outlier detection methods can detect AAs as outliers
Abbr | Algorithm | Accuracy |
---|---|---|
OCSVM [26] | One-class SVM | 99 |
LMDD [7] | Deviation-based | 98 |
LOF [14] | Local outlier factor | 98 |
COF [81] | Connectivity-based | 91 |
CBLOF [40] | Clustering-based | 92 |
HBOS [32] | Histogram-based | 91 |
kNN [70] | k nearest neighbors | 91 |
ABOD [46] | Angle-based | 62 |
COPOD [49] | Copula-based | 75 |
SOS [42] | Stochastic selection | 66 |
IF [83] | Isolation forest | 99 |
FB [48] | Feature bagging | 99 |
XGBOD [100] | Extreme boosting based | 26 |
AutoEncoder [2] | Fully connected AutoEncoder | 43 |
VAE [45] | Variational AutoEncoder | 41 |
SO_GAAL [51] | Single-objective GAN | 40 |
MO_GAAL [51] | Multiple-objective GAN | 35 |
Vdetector [102] | Variable size NSA | 99 |
RNSA [30] | Random real value NSA | 75 |
Static defenses can by-passed by adaptive attacks
Defense objectives
-
Defense needs to work against a diverse set of attack types. Our provided defense technique should work against Gradient or no-gradient, white-box or black-box, targeted or no targeted, adaptive attacks [17].
-
Defense should not reduce the accuracy of ML models. The model accuracy should not get effected after deploying our defense technique.
-
Defense needs to identify threats faster. If a defense system takes sizeable computational time and resources, it will lose the practicability. For example, if the defense is employed in an autonomous car sensor, the input responses need to evaluate first. Otherwise, an accident can happen.
-
Defense should not modify ML architecture. Defense should work for both the white-box and black-box models. A trained ML architectural information is usually black-box. So, it is expected that the defense framework will comply with that.
-
Defense should be adaptive in nature and dynamic to prevent the adaptive attacks.
-
Defense should not need to update if ML changes (Resnet to VGG or ANN to RNN), and it should be cross-domain (image, audio, text) supported.
Our proposed methodology
-
input will be sent to adversarial dataset and the process will terminate.
-
Adversarial dataset will retrain the filter sequence search for noise detection and change the threshold value.
-
If S3 open, extracted filter metrics value will be sent to outlier detection system.
-
If S2 open, input data will be sent to ML model and Switch S5.
-
input will be sent to adversarial dataset and the process will terminate.
-
Adversarial dataset will retrain the filter sequence search for noise detection and change the threshold value.
-
S4 will provide the final output class and S5 will send the input to clean dataset which will trigger the retrain of outlier methods and change the outlier decision boundary.
Multi-objective genetic search for filters
Perturb range/threshold determination of filters
Encoding
Fitness function
Crossover, mutation and selection
One class classifications outlier method
Attack type | Step1 | Step2 | Step3 | Step4 |
---|---|---|---|---|
FGSM | 0.86 | 0.92 | 0.902 | 0.93 |
BIM | 0.89.0 | 90.0 | 90.0 | 0.90 |
PGD | 0.92 | 0.94 | 0.95 | 0.95 |
Adaptiveness and dynamic selection
Attack type | NSA | OCSVM | IF | VAE | SOGAL | MOGAL |
---|---|---|---|---|---|---|
FGSM | 0.93 | 0.99 | 0.93 | 0.65 | 0.5 | 0.5 |
BIM | 0.90 | 0.98 | 0.91 | 0.66 | 0.5 | 0.5 |
PGD | 0.95 | 0.99 | 0.92 | 0.50 | 0.5 | 0.5 |
MBIM | 0.91 | 0.98 | 0.94 | 0.46 | 0.5 | 0.5 |
HSJ | 0.88 | 0.55 | 0.41 | 0.65 | ||
JSMA | 0.9 | 0.56 | 0.8 | 0.83 | ||
CW | 0.96 | 0.42 | 0.66 | 0.52 | ||
DF | 0.91 | 0.45 | 0.76 | 0.55 |
-
Dynamic selection of filter set sequence which will make it harder to formulate adaptive attack based on known filter knowledge.
-
Dynamic selection of outlier detection method, it will make the adaptive attack to consider all outlier detection method when developing attack input that will make generating input computationally expensive.
-
Defense is always learning which will continue changing the filter sequences and decision boundary of outlier detection models. It will make an adaptive attack difficult to search decision boundary.
-
To protect against continuous query-based attacks, we will monitor and analyze input trends using the K-S test. The number of inputs considered for the K-S test will be dynamic. Formulate a query-based attack on the defense system will be hard due to the randomness of the K-S test sample number. Our input trend detection system can effectively monitor adaptive attacks and able to take countermeasure.
Experiments
Dataset generation
Model | AUC | CA | F1 | Precision | Recall | LogLoss | Specificity |
---|---|---|---|---|---|---|---|
Random Forest | 0.973 | 0.845 | 0.844 | 0.844 | 0.845 | 0.412 | 0.926 |
kNN | 0.870 | 0.643 | 0.624 | 0.626 | 0.643 | 0.753 | 0.810 |
Naive Bayes | 0.794 | 0.562 | 0.444 | 0.367 | 0.562 | 0.947 | 0.691 |
Neural network | 0.815 | 0.573 | 0.501 | 0.629 | 0.573 | 0.919 | 0.763 |
SVM | 0.527 | 0.523 | 0.399 | 0.434 | 0.523 | 2.073 | 0.606 |
Logistic regression | 0.813 | 0.566 | 0.489 | 0.442 | 0.566 | 0.952 | 0.724 |
Model | AUC | CA | F1 | Precision | Recall | LogLoss | Specificity |
---|---|---|---|---|---|---|---|
Random Forest | 0.998 | 0.970 | 0.970 | 0.970 | 0.970 | 0.154 | 0.985 |
kNN | 0.966 | 0.844 | 0.840 | 0.837 | 0.844 | 0.332 | 0.928 |
Naive Bayes | 0.914 | 0.737 | 0.740 | 0.749 | 0.737 | 0.613 | 0.916 |
Neural network | 0.951 | 0.816 | 0.810 | 0.807 | 0.816 | 0.420 | 0.919 |
SVM | 0.860 | 0.302 | 0.208 | 0.681 | 0.302 | 1.598 | 0.853 |
Logistic regression | 0.937 | 0.790 | 0.783 | 0.778 | 0.790 | 0.473 | 0.910 |
Model | AUC | CA | F1 | Precision | Recall | LogLoss | Specificity |
---|---|---|---|---|---|---|---|
Random Forest | 1.000 | 0.999 | 0.999 | 0.999 | 0.999 | 0.007 | 1.000 |
kNN | 0.999 | 0.984 | 0.984 | 0.984 | 0.984 | 0.038 | 0.995 |
Naive Bayes | 0.999 | 0.983 | 0.983 | 0.984 | 0.983 | 0.203 | 0.994 |
Neural network | 1.000 | 0.998 | 0.998 | 0.998 | 0.998 | 0.008 | 0.999 |
SVM | 0.896 | 0.590 | 0.531 | 0.815 | 0.590 | 1.317 | 0.864 |
Logistic regression | 0.999 | 0.983 | 0.983 | 0.983 | 0.983 | 0.056 | 0.994 |
Model | AUC | CA | F1 | Precision | Recall | LogLoss | Specificity |
---|---|---|---|---|---|---|---|
Random Forest | 1.000 | 0.999 | 0.999 | 0.999 | 0.999 | 0.002 | 0.998 |
kNN | 1.000 | 0.998 | 0.998 | 0.998 | 0.998 | 0.005 | 0.995 |
Naive Bayes | 0.998 | 0.999 | 0.999 | 0.999 | 0.999 | 0.042 | 0.997 |
Neural network | 1.000 | 0.999 | 0.999 | 0.999 | 0.999 | 0.005 | 0.997 |
SVM | 0.996 | 0.753 | 0.652 | 0.814 | 0.753 | 0.563 | 0.265 |
Logistic regression | 0.999 | 0.997 | 0.997 | 0.997 | 0.997 | 0.017 | 0.992 |
Predicted | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 (%) | 1 (%) | 2 (%) | 3 (%) | 4 (%) | 5 (%) | 6 (%) | 7 (%) | 8 (%) | 9 (%) | \(\sum \) | ||
Actual | 0 | 96.7 | 0.2 | 0.4 | 0.2 | 0.5 | 0.2 | 0.3 | 0.7 | 0.6 | 0.2 | 3259 |
1 | 0.2 | 97.5 | 0.3 | 0.3 | 0.3 | 0.4 | 0.2 | 0.4 | 0.4 | 0.0 | 3313 | |
2 | 0.4 | 0.3 | 95.4 | 0.6 | 0.8 | 0.8 | 0.7 | 0.4 | 0.5 | 0.1 | 3254 | |
3 | 0.6 | 0.3 | 0.5 | 95.4 | 0.7 | 0.6 | 0.4 | 0.5 | 0.7 | 0.2 | 3272 | |
4 | 0.5 | 0.5 | 0.5 | 0.5 | 96.4 | 0.3 | 0.5 | 0.1 | 0.5 | 0.3 | 3262 | |
5 | 0.6 | 0.5 | 0.7 | 0.6 | 0.7 | 95.1 | 0.6 | 0.6 | 0.6 | 0.2 | 3265 | |
6 | 0.5 | 0.4 | 0.5 | 0.3 | 0.5 | 0.5 | 96.5 | 0.2 | 0.4 | 0.1 | 3263 | |
7 | 0.3 | 0.5 | 0.3 | 0.3 | 0.6 | 0.4 | 0.3 | 96.8 | 0.3 | 0.2 | 3292 | |
8 | 0.5 | 0.3 | 0.4 | 0.1 | 0.3 | 0.3 | 0.6 | 0.5 | 96.5 | 0.4 | 3309 | |
9 | 0.1 | 0.2 | 0.2 | 0.3 | 0.2 | 0.1 | 0.2 | 0.2 | 0.2 | 98.5 | 3266 | |
\(\sum \) | 3275 | 3334 | 3224 | 3225 | 3295 | 3227 | 3271 | 3307 | 3331 | 3266 | 32755 |
Experiment with CIFAR and IMAGENET
Models used | MNIST | CIFAR | ||||
---|---|---|---|---|---|---|
FGSM | JSMA | CW | FGSM | JSMA | CW | |
MCD | 0.9846 | 0.99 | 0.9101 | 0.8616 | 0.864 | 0.7871 |
OCSVM | 0.6851 | 0.697 | 0.5421 | 0.8731 | 0.535 | 0.5417 |
LMDD | 0.6673 | 0.601 | 0.553 | 0.5752 | 0.561 | 0.5965 |
LOF | 0.997 | 0.912 | 0.93 | 0.8963 | 0.832 | 0.8096 |
COF | 0.3991 | 0.37 | 0.3568 | |||
CBLOF | 0.9866 | 0.959 | 0.9 | |||
HBOS | 0.9865 | 0.915 | 0.9 | 0.8354 | 0.859 | 0.0016 |
KNN | 0.9993 | 0.909 | 0.9628 | 0.9957 | 0.925 | 0.0682 |
SOD | 0.3842 | 0.461 | 0.3831 | |||
ABOD | 0.9994 | 0.999 | 0.9776 | 0.9982 | 0.922 | 0.8881 |
COPD | 0.9273 | 0.996 | 0.8105 | 0.8255 | 0.803 | 0.7099 |
SOS | 0.4551 | 0.37 | ||||
FB | 0.9942 | 0.99 | 0.9692 | 0.8863 | 0.839 | 0.7716 |
IF | 0.9933 | 0.97 | 0.89 | 0.8444 | 0.834 | 0.6339 |
LSCP | 0.9992 | 0.9 | 0.9832 | 0.8982 | 0.827 | 0.78 |
XGBOD | 0.5 | 0.5 | 0.59 | |||
LODA | 0.9703 | 0.99 | 0.91 | 0.7766 | 0.661 | 0.6286 |
AE | 0.6738 | 0.73 | 0.62 | |||
VAE | 0.8833 | 0.78 | 0.7 | |||
SOGAL | 0.4 | 0.3 | 0.3 | |||
MOGAL | 0.2 | 0.374 | 0.34 | |||
V-Detector | 0.98 | 0.99 | 0.94 | 0.99 | 0.86 | 0.78 |
Attack | CIFAR ‘CAT’ | CIFAR ‘Truck’ | CIFAR ‘DOG’ | CIFAR ‘Ship’ | Imagenet ‘gorilla’ | Imagenet ‘hyena’ |
---|---|---|---|---|---|---|
FGSM | 0.93 | 0.92 | 0.93 | 0.92 | 0.68 | 0.87 |
BIM | 0.90 | 0.90 | 0.91 | 0.71 | 0.83 | 0.82 |
PGD | 0.95 | 0.92 | 0.92 | 0.90 | 0.73 | 0.72 |
MBIM | 0.91 | 0.90 | 0.94 | 0.96 | 0.73 | 0.72 |
HSJ | 0.84 | 0.65 | 0.80 | 0.65 | ||
JSMA | 0.7 | 0.76 | 0.7 | 0.73 | 0.63 | 0.62 |
CW | 0.76 | 0.67 | 0.66 | 0.62 |
AML detection method | MNIST | CIFAR | Avg | ||||||
---|---|---|---|---|---|---|---|---|---|
FGSM | JSMA | HSJ | CW | FGSM | JSMA | HSJ | CW | ||
RF [38] | 0.96 | 0.84 | 0.98 | 0.66 | 0.64 | 0.63 | 0.60 | 0.72 | 0.77 |
KNN [38] | 0.98 | 0.80 | 0.98 | 0.6 | 0.56 | 0.52 | 0.52 | 0.69 | 0.73 |
SVM [38] | 0.98 | 0.89 | 0.98 | – | 0.69 | 0.69 | 0.64 | 0.77 | 0.81 |
Feature Squeezing [95] | 1.00 | 1.00 | – | 0.20 | 0.88 | 0.77 | – | 0.77 | |
Ensemble [10] | 0.99 | – | 0.45 | – | 0.99 | – | 0.42 | – | 0.71 |
Decision mismatch [59] | 0.93 | 0.93 | 0.91 | – | 0.93 | 0.97 | 0.91 | – | 0.93 |
Image quality features [5] | 1.00 | 0.90 | 1.00 | – | 0.72 | 0.70 | 0.68 | – | 0.83 |
(Our framework) | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 | 0.99 | 0.94 | 0.99 |
Comparison with other methods
-
Denoising strategy or gradient masking: Try to remove the distortions of the image.
-
Basic adversarial training: Train the neural network with adversarial example
-
Ensemble methods: Add multiple neural network with transformed dataset to combine a majority result
-
Use of commutative dual filtering technique in any AI/ML–based utility applications.
-
Use of negative filtering will prevent Trojan AI to change decision resulting in robust AI/ML systems.
-
Easy to incorporate in existing and future ML systems will increase adoption and deploy ability.
-
Enhanced performance/accuracy and robustness of ML products and online services will increase in diverse applications.
-
Improved security will result in quality of experience of users.