skip to main content
10.1145/3128572.3140444acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Published:03 November 2017Publication History

ABSTRACT

Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.

References

  1. Marco Barreno, Blaine Nelson, Anthony D. Joseph, and J. D. Tygar. 2010. The security of machine learning. Machine Learning, Vol. 81, 2 (2010), 121--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. Doug Tygar. 2006. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. 2016. Measuring neural net robustness with constraints. Advances In Neural Information Processing Systems. 2613--2621.Google ScholarGoogle Scholar
  4. Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. 2017. Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers. arXiv preprint arXiv:1704:02654 (2017).Google ScholarGoogle Scholar
  5. Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion attacks against machine learning at test time Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 387--402.Google ScholarGoogle Scholar
  6. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, and others. 2016. End to End Learning for Self-Driving Cars. arXiv preprint arXiv:1604.07316 (2016).Google ScholarGoogle Scholar
  7. Karsten M Borgwardt, Arthur Gretton, Malte J Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, Vol. 22, 14 (2006), e49--e57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. IEEE Symposium on Security and Privacy (2017).Google ScholarGoogle ScholarCross RefCross Ref
  9. Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, Deepak Verma, and others. 2004. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.Google ScholarGoogle Scholar
  11. Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. 2017. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410 (2017).Google ScholarGoogle Scholar
  12. Zhitao Gong, Wenlu Wang, and Wei-Shinn Ku. 2017. Adversarial and Clean Data Are Not Twins. arXiv preprint arXiv:1704.04960 (2017).Google ScholarGoogle Scholar
  13. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google ScholarGoogle Scholar
  14. Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. Journal of Machine Learning Research Vol. 13, Mar (2012), 723--773.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. 2017. On the (Statistical) Detection of Adversarial Examples. arXiv preprint arXiv:1702.06280 (2017).Google ScholarGoogle Scholar
  16. Shixiang Gu and Luca Rigazio. 2014. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068 (2014).Google ScholarGoogle Scholar
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778. Google ScholarGoogle ScholarCross RefCross Ref
  18. Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On Detecting Adversarial Perturbations. In International Conference on Learning Representations. shownotearXiv preprint arXiv:1702.04267.Google ScholarGoogle Scholar
  19. Dan Hendrycks and Kevin Gimpel. 2017. Early Methods for Detecting Adversarial Images. In International Conference on Learning Representations (Workshop Track).Google ScholarGoogle Scholar
  20. Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvári. 2015. Learning with a strong adversary. CoRR, abs/1511.03034 (2015).Google ScholarGoogle Scholar
  21. Jonghoon Jin, Aysegul Dundar, and Eugenio Culurciello. 2015. Robust Convolutional Neural Networks under Adversarial Noise. arXiv preprint arXiv:1511.06306 (2015).Google ScholarGoogle Scholar
  22. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  23. Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. 1998. The MNIST database of handwritten digits. (1998).Google ScholarGoogle Scholar
  24. Xin Li and Fuxin Li. 2016. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics. arXiv preprint arXiv:1612.07767 (2016).Google ScholarGoogle Scholar
  25. Daniel Lowd and Christopher Meek. 2005. Adversarial learning Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 641--647.Google ScholarGoogle Scholar
  26. Seyed-Mohsen, Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2574--2582.Google ScholarGoogle Scholar
  27. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines Proceedings of the 27th international conference on machine learning (ICML-10). 807--814.Google ScholarGoogle Scholar
  28. Anders Odén and Hans Wedel. 1975. Arguments for Fisher's permutation test. The Annals of Statistics (1975), 518--520. Google ScholarGoogle ScholarCross RefCross Ref
  29. Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).Google ScholarGoogle Scholar
  30. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings Security and Privacy (EuroS&P), 2016 IEEE European Symposium on. IEEE, 372--387.Google ScholarGoogle Scholar
  31. Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. IEEE Symposium on Security and Privacy (2016).Google ScholarGoogle ScholarCross RefCross Ref
  32. Slav Petrov. 2016. Announcing syntaxnet: The world's most accurate parser goes open source. Google Research Blog, May Vol. 12 (2016), 2016.Google ScholarGoogle Scholar
  33. Andras Rozsa, Ethan M. Rudd, and Terrance E. Boult. 2016. Adversarial diversity and hard positive generation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 25--32.Google ScholarGoogle Scholar
  34. Uri Shaham, Yutaro Yamada, and Sahand Negahban. 2015. Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization. arXiv preprint arXiv:1511.05432 (2015).Google ScholarGoogle Scholar
  35. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, and others. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529, 7587 (2016), 484--489. Google ScholarGoogle ScholarCross RefCross Ref
  36. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2015. Striving for simplicity: The all convolutional net International Conference on Learning Representations (Workshop Track).Google ScholarGoogle Scholar
  37. Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1 (2014), 1929--1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarGoogle Scholar
  39. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. (2014).Google ScholarGoogle Scholar
  40. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, and others. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).Google ScholarGoogle Scholar
  41. Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480--4488.Google ScholarGoogle Scholar

Index Terms

  1. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
        November 2017
        140 pages
        ISBN:9781450352024
        DOI:10.1145/3128572

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        AISec '17 Paper Acceptance Rate11of36submissions,31%Overall Acceptance Rate94of231submissions,41%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader