skip to main content
10.1145/3128572.3140448acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Authors Info & Claims
Published:03 November 2017Publication History

ABSTRACT

Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.

Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack (e.g., Carlini and Wagner's attack) and significantly outperforms existing black-box attacks via substitute models.

References

  1. Marco Barreno, Blaine Nelson, Anthony D. Joseph, and J. D. Tygar. 2010. The security of machine learning. Machine Learning, Vol. 81, 2 (2010), 121--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. Doug Tygar. 2006. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dimitri P Bertsekas. Nonlinear programming.Google ScholarGoogle Scholar
  4. Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion attacks against machine learning at test time Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 387--402.Google ScholarGoogle Scholar
  5. Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning Attacks Against Support Vector Machines. Proceedings of the International Coference on International Conference on Machine Learning. 1467--1474.Google ScholarGoogle Scholar
  6. John Bradshaw, Alexander G. de G. Matthews, and Zoubin Ghahramani. 2017. Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks. arXiv preprint arXiv:1707.02476 (2017).Google ScholarGoogle Scholar
  7. Nicholas Carlini and David Wagner. 2017. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. arXiv preprint arXiv:1705.07263 (2017).Google ScholarGoogle Scholar
  8. Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks IEEE Symposium on Security and Privacy (SP). IEEE, 39--57.Google ScholarGoogle Scholar
  9. Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. 2017. Robust Physical-World Attacks on Machine Learning Models. arXiv preprint arXiv:1707.08945 (2017).Google ScholarGoogle Scholar
  10. Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. 2017. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410 (2017).Google ScholarGoogle Scholar
  11. Saeed Ghadimi and Guanghui Lan. 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization Vol. 23, 4 (2013), 2341--2368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google ScholarGoogle Scholar
  13. Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. 2017. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).Google ScholarGoogle Scholar
  14. Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2016. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435 (2016).Google ScholarGoogle Scholar
  15. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  16. Weiwei Hu and Ying Tan. 2017. Black-Box Attacks against RNN based Malware Detection Algorithms. arXiv preprint arXiv:1705.08131 (2017).Google ScholarGoogle Scholar
  17. Weiwei Hu and Ying Tan. 2017. Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. arXiv preprint arXiv:1702.05983 (2017).Google ScholarGoogle Scholar
  18. Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2016. Safety verification of deep neural networks. arXiv preprint arXiv:1610.06940 (2016).Google ScholarGoogle Scholar
  19. Jonghoon Jin, Aysegul Dundar, and Eugenio Culurciello. 2015. Robust convolutional neural networks under adversarial noise. arXiv preprint arXiv:1511.06306 (2015).Google ScholarGoogle Scholar
  20. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  21. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google ScholarGoogle Scholar
  22. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016).Google ScholarGoogle Scholar
  23. Peter D. Lax and Maria Shea Terrell. 2014. Calculus with applications. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  24. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521, 7553 (2015), 436--444. Google ScholarGoogle Scholar
  25. Xiangru Lian, Huan Zhang, Cho-Jui Hsieh, Yijun Huang, and Ji Liu. 2016. A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to first-order Advances in Neural Information Processing Systems. 3054--3062.Google ScholarGoogle Scholar
  26. Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2016. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016).Google ScholarGoogle Scholar
  27. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv preprint arXiv:1706.06083 (2017).Google ScholarGoogle Scholar
  28. Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017).Google ScholarGoogle Scholar
  29. Seyed-Mohsen, Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2016. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401 (2016).Google ScholarGoogle Scholar
  30. Seyed-Mohsen, Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard, and Stefano Soatto. 2017. Analysis of universal adversarial perturbations. arXiv preprint arXiv:1705.09554 (2017).Google ScholarGoogle Scholar
  31. Seyed-Mohsen, Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2574--2582.Google ScholarGoogle Scholar
  32. Yurii Nesterov, et al. 2011. Random gradient-free minimization of convex functions. Technical Report. Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).Google ScholarGoogle Scholar
  33. Nicolas Papernot and Patrick McDaniel, 2017. Extending Defensive Distillation. arXiv preprint arXiv:1705.05264 (2017).Google ScholarGoogle Scholar
  34. Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).Google ScholarGoogle Scholar
  35. Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning Proceedings of the ACM on Asia Conference on Computer and Communications Security. ACM, 506--519.Google ScholarGoogle Scholar
  36. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings IEEE European Symposium on Security and Privacy (EuroS&P). 372--387.Google ScholarGoogle Scholar
  37. Nicolas Papernot, Patrick McDaniel, Ananthram Swami, and Richard Harang 2016. Crafting adversarial input sequences for recurrent neural networks IEEE Military Communications Conference (MILCOM). 49--54.Google ScholarGoogle Scholar
  38. Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks IEEE Symposium on Security and Privacy (SP). 582--597.Google ScholarGoogle Scholar
  39. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarGoogle Scholar
  40. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google ScholarGoogle Scholar
  41. Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. 2017. Ensemble Adversarial Training: Attacks and Defenses. arXiv preprint arXiv:1705.07204 (2017).Google ScholarGoogle Scholar
  42. Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. arXiv preprint arXiv:1704.01155 (2017).Google ScholarGoogle Scholar
  43. Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples. arXiv preprint arXiv:1705.10686 (2017).Google ScholarGoogle Scholar
  44. Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. 2017. Efficient Defenses Against Adversarial Attacks. arXiv preprint arXiv:1707.06728.Google ScholarGoogle Scholar
  45. Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480--4488.Google ScholarGoogle Scholar

Index Terms

  1. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
          November 2017
          140 pages
          ISBN:9781450352024
          DOI:10.1145/3128572

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          AISec '17 Paper Acceptance Rate11of36submissions,31%Overall Acceptance Rate94of231submissions,41%

          Upcoming Conference

          CCS '24
          ACM SIGSAC Conference on Computer and Communications Security
          October 14 - 18, 2024
          Salt Lake City , UT , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader