ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Authors:
Pin-Yu Chen

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Huan Zhang

University of California, Davis, Davis, CA & IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

University of California, Davis, Davis, CA & IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Yash Sharma

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Jinfeng Yi

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Cho-Jui Hsieh

University of California, Davis, Davis, CA, USA

University of California, Davis, Davis, CA, USA
View Profile

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and SecurityNovember 2017Pages 15–26https://doi.org/10.1145/3128572.3140448

Published:03 November 2017Publication History

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pages 15–26

ABSTRACT

Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs.

Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack (e.g., Carlini and Wagner's attack) and significantly outperforms existing black-box attacks via substitute models.

References

Marco Barreno, Blaine Nelson, Anthony D. Joseph, and J. D. Tygar. 2010. The security of machine learning. Machine Learning, Vol. 81, 2 (2010), 121--148. Google ScholarDigital Library
Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. Doug Tygar. 2006. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 16--25. Google ScholarDigital Library
Dimitri P Bertsekas. Nonlinear programming.Google Scholar
Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion attacks against machine learning at test time Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 387--402.Google Scholar
Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning Attacks Against Support Vector Machines. Proceedings of the International Coference on International Conference on Machine Learning. 1467--1474.Google Scholar
John Bradshaw, Alexander G. de G. Matthews, and Zoubin Ghahramani. 2017. Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks. arXiv preprint arXiv:1707.02476 (2017).Google Scholar
Nicholas Carlini and David Wagner. 2017. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. arXiv preprint arXiv:1705.07263 (2017).Google Scholar
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks IEEE Symposium on Security and Privacy (SP). IEEE, 39--57.Google Scholar
Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. 2017. Robust Physical-World Attacks on Machine Learning Models. arXiv preprint arXiv:1707.08945 (2017).Google Scholar
Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. 2017. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410 (2017).Google Scholar
Saeed Ghadimi and Guanghui Lan. 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization Vol. 23, 4 (2013), 2341--2368. Google ScholarDigital Library
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. 2017. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).Google Scholar
Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2016. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435 (2016).Google Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
Weiwei Hu and Ying Tan. 2017. Black-Box Attacks against RNN based Malware Detection Algorithms. arXiv preprint arXiv:1705.08131 (2017).Google Scholar
Weiwei Hu and Ying Tan. 2017. Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. arXiv preprint arXiv:1702.05983 (2017).Google Scholar
Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2016. Safety verification of deep neural networks. arXiv preprint arXiv:1610.06940 (2016).Google Scholar
Jonghoon Jin, Aysegul Dundar, and Eugenio Culurciello. 2015. Robust convolutional neural networks under adversarial noise. arXiv preprint arXiv:1511.06306 (2015).Google Scholar
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google Scholar
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016).Google Scholar
Peter D. Lax and Maria Shea Terrell. 2014. Calculus with applications. Springer. Google ScholarCross Ref
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521, 7553 (2015), 436--444. Google Scholar
Xiangru Lian, Huan Zhang, Cho-Jui Hsieh, Yijun Huang, and Ji Liu. 2016. A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to first-order Advances in Neural Information Processing Systems. 3054--3062.Google Scholar
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2016. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016).Google Scholar
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv preprint arXiv:1706.06083 (2017).Google Scholar
Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017).Google Scholar
Seyed-Mohsen, Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2016. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401 (2016).Google Scholar
Seyed-Mohsen, Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard, and Stefano Soatto. 2017. Analysis of universal adversarial perturbations. arXiv preprint arXiv:1705.09554 (2017).Google Scholar
Seyed-Mohsen, Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2574--2582.Google Scholar
Yurii Nesterov, et al. 2011. Random gradient-free minimization of convex functions. Technical Report. Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).Google Scholar
Nicolas Papernot and Patrick McDaniel, 2017. Extending Defensive Distillation. arXiv preprint arXiv:1705.05264 (2017).Google Scholar
Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).Google Scholar
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning Proceedings of the ACM on Asia Conference on Computer and Communications Security. ACM, 506--519.Google Scholar
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings IEEE European Symposium on Security and Privacy (EuroS&P). 372--387.Google Scholar
Nicolas Papernot, Patrick McDaniel, Ananthram Swami, and Richard Harang 2016. Crafting adversarial input sequences for recurrent neural networks IEEE Military Communications Conference (MILCOM). 49--54.Google Scholar
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks IEEE Symposium on Security and Privacy (SP). 582--597.Google Scholar
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google Scholar
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. 2017. Ensemble Adversarial Training: Attacks and Defenses. arXiv preprint arXiv:1705.07204 (2017).Google Scholar
Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. arXiv preprint arXiv:1704.01155 (2017).Google Scholar
Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples. arXiv preprint arXiv:1705.10686 (2017).Google Scholar
Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. 2017. Efficient Defenses Against Adversarial Attacks. arXiv preprint arXiv:1707.06728.Google Scholar
Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480--4488.Google Scholar

Index Terms

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Software and application security

Recommendations

Towards Query Efficient Black-box Attacks: An Input-free Perspective
AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security

Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial attacks, even in a black-box scenario. However, most of the existing black-box attack algorithms need to make a huge amount of queries to perform attacks, ...
Read More
Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain

In recent years, machine learning algorithms, and more specifically deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the ...
Read More
Improving the transferability of adversarial samples with channel switching
Abstract
Deep neural network models are vulnerable to interference from adversarial samples. An alarming issue is that adversarial samples are often transferable, implying that an adversarial sample generated by one model can attack other models. In a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
November 2017
140 pages
ISBN:9781450352024
DOI:10.1145/3128572
General Chair:
Bhavani Thuraisingham
University of Texas at Dallas, USA
,
Program Chairs:
Battista Biggio
Pluribus One and University of Cagliari, Italy
,
David Mandell Freeman
Facebook Inc., USA
,
Brad Miller
Google Inc., USA
,
Arunesh Sinha
University of Michigan, Ann Arbor, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adversarial learning
black-box attack
deep learning
neural network
substitute model
Qualifiers
- research-article
Conference

Acceptance Rates
AISec '17 Paper Acceptance Rate11of36submissions,31%Overall Acceptance Rate94of231submissions,41%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 870
  Total Citations
  View Citations
- 7,130
  Total Downloads
- Downloads (Last 12 months)1,713
- Downloads (Last 6 weeks)214
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Query Efficient Black-box Attacks: An Input-free Perspective

Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain

Improving the transferability of adversarial samples with channel switching