research-article

MagNet: A Two-Pronged Defense against Adversarial Examples

Authors:
Dongyu Meng

ShanghaiTech University, Shanghai, China

ShanghaiTech University, Shanghai, China
View Profile

,
Hao Chen

University of California, Davis, Davis, CA, USA

University of California, Davis, Davis, CA, USA
View Profile

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2017Pages 135–147https://doi.org/10.1145/3133956.3134057

Published:30 October 2017Publication History

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Pages 135–147

ABSTRACT

Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequences where safety or security is crucial. Prior defenses against adversarial examples either targeted specific attacks or were shown to be ineffective.

We propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet neither modifies the protected classifier nor requires knowledge of the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. The detector networks learn to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since they assume no specific process for generating adversarial examples, they generalize well. The reformer network moves adversarial examples towards the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbation. We discuss the intrinsic difficulties in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we use diversity to strengthen MagNet. We show empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

Supplemental Material

References

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604 .07316, 2016.Google Scholar
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017. Google ScholarCross Ref
Shreyansh Daftry, J Andrew Bagnell, and Martial Hebert. Learning transferable policies for monocular reactive mav control. arXiv preprint arXiv:1608.00627, 2016.Google Scholar
Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696, 2016.Google Scholar
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.Google Scholar
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.Google ScholarDigital Library
Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.Google Scholar
Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435, 2016.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770--778, 2016. Google ScholarCross Ref
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503 .02531, 2015.Google Scholar
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82--97, 2012.Google ScholarCross Ref
Wookhyun Jung, Sangwon Kim, and Sangyong Choi. Poster: deep learning for zero-day flash malware detection. In 36th IEEE Symposium on Security and Privacy, 2015. Google ScholarDigital Library
Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, and Sergey Levine. Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182, 2017.Google Scholar
Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. arXiv preprint arXiv:1702.06832, 2017.Google Scholar
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images, 2009.Google Scholar
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. Ask me anything: dynamic memory networks for natural language processing. In International Conference on Machine Learning, pages 1378--1387, 2016.Google ScholarDigital Library
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.Google Scholar
Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, 1998.Google Scholar
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017.Google Scholar
Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR), April 24--26, 2017.Google Scholar
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401, 2016.Google Scholar
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015.Google Scholar
H. Narayanan and S. Mitter. Sample complexity of testing the manifold hypothesis. In NIPS, 2010.Google Scholar
N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroSP), 2016. Google ScholarCross Ref
N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against 12 deep neural networks. In IEEE Symposium on Security and Privacy, 2016.Google ScholarCross Ref
Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. Cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610 .00768, 2016.Google Scholar
Nicolas Papernot, Patrick D. McDaniel, Ananthram Swami, and Richard E. Harang. Crafting adversarial input sequences for recurrent neural networks. CoRR, abs/1604.08275, 2016.Google Scholar
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506--519, 2017. Google ScholarDigital Library
Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 1916-- 1920. IEEE, 2015.Google ScholarCross Ref
Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv :1511.05432, 2015.Google Scholar
Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep learning in medical image analysis. Annual Review of Biomedical Engineering, (0), 2017.Google ScholarCross Ref
Justin Sirignano, Apaar Sadhwani, and Kay Giesecke. Deep learning for mortgage risk, 2016.Google Scholar
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806, 2014.Google Scholar
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.Google Scholar
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and PierreAntoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096-- 1103, 2008.Google ScholarDigital Library
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371--3408, 2010.Google ScholarDigital Library
Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. arXiv preprint arXiv:1703 .08603, 2017.Google Scholar

Index Terms

MagNet: A Two-Pronged Defense against Adversarial Examples
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Software and application security
    1. Domain-specific security and privacy architectures

Recommendations

POSTER: Zero-Day Evasion Attack Analysis on Race between Attack and Defense
ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security

Deep neural networks (DNNs) exhibit excellent performance in machine learning tasks such as image recognition, pattern recognition, speech recognition, and intrusion detection. However, the usage of adversarial examples, which are intentionally ...
Read More
Fuzzing-based hard-label black-box attacks against machine learning models
Abstract
Machine learning models are vulnerable to adversarial examples. We study the most realistic hard-label black-box attacks in this paper. The main limitation of the existing attacks is that they need a large number of model queries, ...
Read More
Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses
AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security

It has been shown that adversaries can craft example inputs to neural networks which are similar to legitimate inputs but have been created to purposely cause the neural network to misclassify the input. These adversarial examples are crafted, for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
October 2017
2682 pages
ISBN:9781450349468
DOI:10.1145/3133956
General Chair:
Bhavani Thuraisingham
The University of Texas at Dallas, USA
,
Program Chairs:
David Evans
University of Virginia
,
Tal Malkin
Columbia University
,
Dongyan Xu
Purdue University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adversarial example
autoencoder
neural network
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 572
  Total Citations
  View Citations
- 2,962
  Total Downloads
- Downloads (Last 12 months)327
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MagNet: A Two-Pronged Defense against Adversarial Examples

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

POSTER: Zero-Day Evasion Attack Analysis on Race between Attack and Defense

Fuzzing-based hard-label black-box attacks against machine learning models

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MagNet: A Two-Pronged Defense against Adversarial Examples

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

POSTER: Zero-Day Evasion Attack Analysis on Race between Attack and Defense

Fuzzing-based hard-label black-box attacks against machine learning models

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media