skip to main content
10.1145/3133956.3134057acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

MagNet: A Two-Pronged Defense against Adversarial Examples

Published:30 October 2017Publication History

ABSTRACT

Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequences where safety or security is crucial. Prior defenses against adversarial examples either targeted specific attacks or were shown to be ineffective.

We propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet neither modifies the protected classifier nor requires knowledge of the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. The detector networks learn to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since they assume no specific process for generating adversarial examples, they generalize well. The reformer network moves adversarial examples towards the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbation. We discuss the intrinsic difficulties in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we use diversity to strengthen MagNet. We show empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

Skip Supplemental Material Section

Supplemental Material

References

  1. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604 .07316, 2016.Google ScholarGoogle Scholar
  2. Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  3. Shreyansh Daftry, J Andrew Bagnell, and Martial Hebert. Learning transferable policies for monocular reactive mav control. arXiv preprint arXiv:1608.00627, 2016.Google ScholarGoogle Scholar
  4. Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696, 2016.Google ScholarGoogle Scholar
  5. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.Google ScholarGoogle Scholar
  6. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.Google ScholarGoogle Scholar
  8. Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435, 2016.Google ScholarGoogle Scholar
  9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770--778, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  10. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503 .02531, 2015.Google ScholarGoogle Scholar
  11. Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82--97, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  12. Wookhyun Jung, Sangwon Kim, and Sangyong Choi. Poster: deep learning for zero-day flash malware detection. In 36th IEEE Symposium on Security and Privacy, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, and Sergey Levine. Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182, 2017.Google ScholarGoogle Scholar
  14. Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. arXiv preprint arXiv:1702.06832, 2017.Google ScholarGoogle Scholar
  15. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images, 2009.Google ScholarGoogle Scholar
  16. Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. Ask me anything: dynamic memory networks for natural language processing. In International Conference on Machine Learning, pages 1378--1387, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.Google ScholarGoogle Scholar
  18. Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, 1998.Google ScholarGoogle Scholar
  19. Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017.Google ScholarGoogle Scholar
  20. Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR), April 24--26, 2017.Google ScholarGoogle Scholar
  21. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401, 2016.Google ScholarGoogle Scholar
  22. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015.Google ScholarGoogle Scholar
  23. H. Narayanan and S. Mitter. Sample complexity of testing the manifold hypothesis. In NIPS, 2010.Google ScholarGoogle Scholar
  24. N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroSP), 2016. Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against 12 deep neural networks. In IEEE Symposium on Security and Privacy, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  26. Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. Cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610 .00768, 2016.Google ScholarGoogle Scholar
  27. Nicolas Papernot, Patrick D. McDaniel, Ananthram Swami, and Richard E. Harang. Crafting adversarial input sequences for recurrent neural networks. CoRR, abs/1604.08275, 2016.Google ScholarGoogle Scholar
  28. Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506--519, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 1916-- 1920. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  30. Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv :1511.05432, 2015.Google ScholarGoogle Scholar
  31. Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep learning in medical image analysis. Annual Review of Biomedical Engineering, (0), 2017.Google ScholarGoogle ScholarCross RefCross Ref
  32. Justin Sirignano, Apaar Sadhwani, and Kay Giesecke. Deep learning for mortgage risk, 2016.Google ScholarGoogle Scholar
  33. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806, 2014.Google ScholarGoogle Scholar
  34. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.Google ScholarGoogle Scholar
  35. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and PierreAntoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096-- 1103, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371--3408, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. arXiv preprint arXiv:1703 .08603, 2017.Google ScholarGoogle Scholar

Index Terms

  1. MagNet: A Two-Pronged Defense against Adversarial Examples

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
        October 2017
        2682 pages
        ISBN:9781450349468
        DOI:10.1145/3133956

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader