skip to main content
10.1145/3468264.3468612acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Public Access

Exposing numerical bugs in deep learning via gradient back-propagation

Published:18 August 2021Publication History

ABSTRACT

Numerical computation is dominant in deep learning (DL) programs. Consequently, numerical bugs are one of the most prominent kinds of defects in DL programs. Numerical bugs can lead to exceptional values such as NaN (Not-a-Number) and INF (Infinite), which can be propagated and eventually cause crashes or invalid outputs. They occur when special inputs cause invalid parameter values at internal mathematical operations such as log(). In this paper, we propose the first dynamic technique, called GRIST, which automatically generates a small input that can expose numerical bugs in DL programs. GRIST piggy-backs on the built-in gradient computation functionalities of DL infrastructures. Our evaluation on 63 real-world DL programs shows that GRIST detects 78 bugs including 56 unknown bugs. By submitting them to the corresponding issue repositories, eight bugs have been confirmed and three bugs have been fixed. Moreover, GRIST can save 8.79X execution time to expose numerical bugs compared to running original programs with its provided inputs. Compared to the state-of-the-art technique DEBAR (which is a static technique), DEBAR produces 12 false positives and misses 31 true bugs (of which 30 bugs can be found by GRIST), while GRIST only misses one known bug in those programs and no false positive. The results demonstrate the effectiveness of GRIST.

References

  1. Accessed: 2020. GitHub. https://github.com/philipperemy/deep-speaker/issues/5Google ScholarGoogle Scholar
  2. Accessed: 2020. GitHub. https://github.com/ForeverZyh/TensorFlow-Program-Bugs/blob/master/StackOverflow/IPS-2/33699174-buggy/mnist.pyGoogle ScholarGoogle Scholar
  3. Accessed: 2020. GitHub. https://github.com/adamsolomou/SC-DNN/blob/a9169c6b7a0d456c1d2f229913e2d8c042c40aab/src/training/sc_train_creg.pyGoogle ScholarGoogle Scholar
  4. Accessed: 2020. PyTorch Forums. https://discuss.pytorch.org/t/my-self-implemented-batchnorm-relu-gives-nan/42294Google ScholarGoogle Scholar
  5. Accessed: 2020. PyTorch Forums. https://discuss.pytorch.org/t/different-losses-on-2-different-machines/36446/5Google ScholarGoogle Scholar
  6. Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Using Static Analysis to Find Bugs. IEEE Softw., 25, 5 (2008), 22–29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Houssem Ben Braiek and Foutse Khomh. 2019. DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks. In ICSME. IEEE, 454–458.Google ScholarGoogle Scholar
  8. Houssem Ben Braiek and Foutse Khomh. 2019. TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs. In 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019, Sofia, Bulgaria, July 22-26, 2019. IEEE, 426–433.Google ScholarGoogle ScholarCross RefCross Ref
  9. Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In IEEE Symposium on Security and Privacy. IEEE Computer Society, 39–57.Google ScholarGoogle Scholar
  10. Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722–2730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Junjie Chen, Xiaoting He, Qingwei Lin, Hongyu Zhang, Dan Hao, Feng Gao, Zhangwei Xu, Yingnong Dang, and Dongmei Zhang. 2019. Continuous Incident Triage for Large-Scale Online Service Systems. In 34th IEEE/ACM International Conference on Automated Software Engineering. 364–375.Google ScholarGoogle Scholar
  12. Junjie Chen, Haoyang Ma, and Lingming Zhang. 2020. Enhanced Compiler Bug Isolation via Memoized Search. In 35th IEEE/ACM International Conference on Automated Software Engineering. 78–89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Junjie Chen, Zhuo Wu, Zan Wang, Hanmo You, Lingming Zhang, and Ming Yan. 2020. Practical Accuracy Estimation for Efficient Deep Neural Network Testing. ACM Trans. Softw. Eng. Methodol., 29, 4 (2020), 30:1–30:35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Junjie Chen, Shu Zhang, Xiaoting He, Qingwei Lin, Hongyu Zhang, Dan Hao, Yu Kang, Feng Gao, Zhangwei Xu, Yingnong Dang, and Dongmei Zhang. 2020. How Incidental are the Incidents? Characterizing and Prioritizing Incidents for Large-Scale Online Service Systems. In 35th IEEE/ACM International Conference on Automated Software Engineering. 373–384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Will Dietz, Peng Li, John Regehr, and Vikram S. Adve. 2012. Understanding integer overflow in C/C++. In 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. IEEE Computer Society, 760–770.Google ScholarGoogle Scholar
  16. Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. A Quantitative Analysis Framework for Recurrent Neural Network. In ASE. IEEE, 1062–1065.Google ScholarGoogle Scholar
  17. Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, and Raymie Stata. 2013. PLDI 2002: Extended static checking for Java. ACM SIGPLAN Notices, 48, 4S (2013), 22–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Anthony Di Franco, Hui Guo, and Cindy Rubio-González. 2017. A comprehensive study of real-world numerical bug characteristics. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. IEEE Computer Society, 509–519.Google ScholarGoogle ScholarCross RefCross Ref
  19. Zhoulai Fu and Zhendong Su. 2017. Achieving high coverage for floating-point code via unconstrained programming. In PLDI. ACM, 306–319.Google ScholarGoogle Scholar
  20. Zhoulai Fu and Zhendong Su. 2019. Effective floating-point analysis via weak-distance minimization. In PLDI. ACM, 439–452.Google ScholarGoogle Scholar
  21. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In ICLR (Poster).Google ScholarGoogle Scholar
  22. Hui Guo and Cindy Rubio-González. 2020. Efficient generation of error-inducing floating-point inputs via symbolic execution. In ICSE. ACM, 1261–1272.Google ScholarGoogle Scholar
  23. Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: differential fuzzing testing of deep learning systems. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018. ACM, 739–743.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Andrew Habib and Michael Pradel. 2018. How many of all bugs do we find? a study of static bug detectors. In ASE. ACM, 317–328.Google ScholarGoogle Scholar
  25. Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, and Miryung Kim. 2020. Is neuron coverage a meaningful measure for testing deep neural networks? In ESEC/SIGSOFT FSE. ACM, 851–862.Google ScholarGoogle Scholar
  26. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2019. Taxonomy of Real Faults in Deep Learning Systems. CoRR, abs/1910.11015 (2019).Google ScholarGoogle Scholar
  27. Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Alessandra Russo Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019. ACM, 510–520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing Deep Neural Networks: Fix Patterns and Challenges. CoRR, abs/2005.00972 (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019. IEEE / ACM, 1039–1049.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In ICLR (Workshop). OpenReview.net.Google ScholarGoogle Scholar
  31. Ron Larson and Bruce H Edwards. 2016. Calculus of a single variable. Nelson Education.Google ScholarGoogle Scholar
  32. Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. 2018. Towards Robust Neural Networks via Random Self-ensemble. In ECCV (7) (Lecture Notes in Computer Science, Vol. 11211). Springer, 381–397.Google ScholarGoogle Scholar
  33. Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019. IEEE, 614–618.Google ScholarGoogle Scholar
  34. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. ACM, 120–131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.Google ScholarGoogle Scholar
  36. Maren Mahsereci, Lukas Balles, Christoph Lassner, and Philipp Hennig. 2017. Early stopping without a validation set. arXiv preprint arXiv:1703.09580.Google ScholarGoogle Scholar
  37. Augustus Odena, Catherine Olsson, David Andersen, and Ian J. Goodfellow. 2019. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97). PMLR, 4901–4911.Google ScholarGoogle Scholar
  38. Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The Limitations of Deep Learning in Adversarial Settings. In EuroS&P. IEEE, 372–387.Google ScholarGoogle Scholar
  39. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017. ACM, 1–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019. IEEE / ACM, 1027–1038.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the trade. Springer, 55–69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Louis B. Rall. 1981. Automatic Differentiation: Techniques and Applications (Lecture Notes in Computer Science, Vol. 120). Springer. isbn:3-540-10861-0 https://doi.org/10.1007/3-540-10861-0 Google ScholarGoogle ScholarCross RefCross Ref
  43. Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A Comprehensive Study of Deep Learning Compiler Bugs. In ESEC/FSE. to appear.Google ScholarGoogle Scholar
  44. Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Advances in neural information processing systems. 1988–1996.Google ScholarGoogle Scholar
  45. Enyi Tang, Xiangyu Zhang, Norbert Th. Müller, Zhenyu Chen, and Xuandong Li. 2017. Software Numerical Instability Detection and Diagnosis by Combining Stochastic and Infinite-Precision Testing. IEEE Trans. Software Eng., 43, 10 (2017), 975–994.Google ScholarGoogle ScholarCross RefCross Ref
  46. Ferdian Thung, Lucia, David Lo, Lingxiao Jiang, Foyzur Rahman, and Premkumar T. Devanbu. 2012. To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools. In ASE. ACM, 50–59.Google ScholarGoogle Scholar
  47. Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788–799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021. Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis. In 43rd IEEE/ACM International Conference on Software Engineering. 397–409.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. 2018. Spatially Transformed Adversarial Examples. In ICLR (Poster). OpenReview.net.Google ScholarGoogle Scholar
  50. Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan L. Yuille. 2018. Mitigating Adversarial Effects Through Randomization. In ICLR (Poster). OpenReview.net.Google ScholarGoogle Scholar
  51. Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: a coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019. ACM, 146–157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised Log-based Anomaly Detection via Probabilistic Label Estimation. In 43rd IEEE/ACM International Conference on Software Engineering. 1448–1460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. [n.d.]. An Empirical Study on Program Failures of Deep Learning Jobs.Google ScholarGoogle Scholar
  54. Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, and Miryung Kim. 2019. An Empirical Study of Common Challenges in Developing Deep Learning Applications. In 30th IEEE International Symposium on Software Reliability Engineering, ISSRE 2019, Berlin, Germany, October 28-31, 2019. IEEE, 104–115.Google ScholarGoogle Scholar
  55. Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018. ACM, 129–140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020. Detecting numerical bugs in neural network architectures. In ESEC/SIGSOFT FSE. ACM, 826–837.Google ScholarGoogle Scholar

Index Terms

  1. Exposing numerical bugs in deep learning via gradient back-propagation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
        August 2021
        1690 pages
        ISBN:9781450385626
        DOI:10.1145/3468264

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 August 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

        Upcoming Conference

        FSE '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader