skip to main content
10.1145/3324884.3416545acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Problems and opportunities in training deep learning software systems: an analysis of variance

Published:27 January 2021Publication History

ABSTRACT

Deep learning (DL) training algorithms utilize nondeterminism to improve models' accuracy and training efficiency. Hence, multiple identical training runs (e.g., identical training data, algorithm, and network) produce different models with different accuracies and training times. In addition to these algorithmic factors, DL libraries (e.g., TensorFlow and cuDNN) introduce additional variance (referred to as implementation-level variance) due to parallelism, optimization, and floating-point computation.

This work is the first to study the variance of DL systems and the awareness of this variance among researchers and practitioners. Our experiments on three datasets with six popular networks show large overall accuracy differences among identical training runs. Even after excluding weak models, the accuracy difference is 10.8%. In addition, implementation-level factors alone cause the accuracy difference across identical training runs to be up to 2.9%, the per-class accuracy difference to be up to 52.4%, and the training time difference to be up to 145.3%. All core libraries (TensorFlow, CNTK, and Theano) and low-level libraries (e.g., cuDNN) exhibit implementation-level variance across all evaluated versions.

Our researcher and practitioner survey shows that 83.8% of the 901 participants are unaware of or unsure about any implementation-level variance. In addition, our literature survey shows that only 19.5±3% of papers in recent top software engineering (SE), artificial intelligence (AI), and systems conferences use multiple identical training runs to quantify the variance of their DL approaches. This paper raises awareness of DL variance and directs SE researchers to challenging tasks such as creating deterministic DL implementations to facilitate debugging and improving the reproducibility of DL software and results.

References

  1. 2019. ASE '19: Proceedings of the 34th ACM/IEEE International Conference on Automated Software Engineering.Google ScholarGoogle Scholar
  2. 2019. ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.Google ScholarGoogle Scholar
  3. 2019. CVPR'19: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  4. 2019. ESEC/FSE '19: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google ScholarGoogle Scholar
  5. 2019. ICCV'19: Proceedings of The IEEE International Conference on Computer Vision. IEEE Press.Google ScholarGoogle Scholar
  6. 2019. ICML'19: Proceedings of The International Conference on Machine Learning.Google ScholarGoogle Scholar
  7. 2019. ICSE '19: Proceedings of the 41st International Conference on Software Engineering. IEEE Press.Google ScholarGoogle Scholar
  8. 2019. MLSys'19: Proceedings of Machine Learning and Systems.Google ScholarGoogle Scholar
  9. 2019. NIPS'19: Proceedings of the 33rd Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  10. 2019. SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles.Google ScholarGoogle Scholar
  11. 2020. ICLR'20: Proceedings of The International Conference on Learning Representations.Google ScholarGoogle Scholar
  12. 2020. Reproducibility Challenge @ NeurIPS 2019. https://reproducibilitychallenge.github.io/neurips2019/Google ScholarGoogle Scholar
  13. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: a System for Large-Scale Machine Learning. In OSDI.Google ScholarGoogle Scholar
  14. Charu C Aggarwal. 2018. Neural networks and deep learning. In Springer.Google ScholarGoogle Scholar
  15. Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In ICSE.Google ScholarGoogle Scholar
  16. James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeron, et al. 2011. Theano: Deep learning on GPUs with Python. In NIPS.Google ScholarGoogle Scholar
  17. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. In JMLR.Google ScholarGoogle Scholar
  18. Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang. 2019. AutoFocus: Interpreting Attention-Based Neural Networks by Code Perturbation. In ASE.Google ScholarGoogle Scholar
  19. Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning Affordance for Direct Perception in Autonomous Driving. In ICCV.Google ScholarGoogle Scholar
  20. Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhu, Guoqiang Li, and Jinshui Wang. 2020. Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning. In ICSE.Google ScholarGoogle Scholar
  21. Jinyin Chen, Keke Hu, Yue Yu, Zhuangzhi Chen, Qi Xuan, Yi Liu, and Vladimir Filkov. 2020. Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction. In ICSE.Google ScholarGoogle Scholar
  22. Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In NIPS.Google ScholarGoogle Scholar
  23. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning.Google ScholarGoogle Scholar
  24. François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In CVPR.Google ScholarGoogle Scholar
  25. François Chollet et al. 2015. Keras. https://keras.io.Google ScholarGoogle Scholar
  26. Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. 2015. The loss surfaces of multilayer networks. In AISTATS.Google ScholarGoogle Scholar
  27. Anna Choromanska, Yann LeCun, and Gérard Ben Arous. 2015. Open Problem: The landscape of the loss surfaces of multilayer networks. In COLR.Google ScholarGoogle Scholar
  28. J. Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences.Google ScholarGoogle Scholar
  29. Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. 2018. How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments. In CoRR.Google ScholarGoogle Scholar
  30. Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. In Training.Google ScholarGoogle Scholar
  31. Standard Performance Evaluation Corporation. 2006. SPEC CPU2006 Benchmarks. http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  32. Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang, Garth A. Gibson, and Randal E. Bryant. 2013. Parrot: a Practical Runtime for Deterministic, Stable, and Reliable Threads. In SOSP.Google ScholarGoogle Scholar
  33. Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In NIPS.Google ScholarGoogle Scholar
  34. Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A. Smith. 2019. Show Your Work: Improved Reporting of Experimental Results. In EMNLP-IJCNLP.Google ScholarGoogle Scholar
  35. Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. 2018. Essentially No Barriers in Neural Network Energy Landscape. In ICML.Google ScholarGoogle Scholar
  36. Simon S Du, Jason D Lee, Yuandong Tian, Barnabas Poczos, and Aarti Singh. 2017. Gradient descent learns one-hidden-layer cnn: Don't be afraid of spurious local minima. In arXiv preprint arXiv:1712.00779.Google ScholarGoogle Scholar
  37. Brian Everitt. 2002. The Cambridge dictionary of statistics.Google ScholarGoogle Scholar
  38. Hao-Shu Fang, Guansong Lu, Xiaolin Fang, Jianwen Xie, Yu-Wing Tai, and Cewu Lu. 2018. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer. In CVPR.Google ScholarGoogle Scholar
  39. Xiang Gao, Ripon Saha, Mukul Prasad, and Abhik Roychoudhury. 2020. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In ICSE.Google ScholarGoogle Scholar
  40. Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. 2015. Escaping from saddle points---online stochastic gradient for tensor decomposition. In COLT.Google ScholarGoogle Scholar
  41. Simos Gerasimou, Hasan Ferit-Eniser, Alper Sen, and Alper Çakan. 2020. Importance-Driven Deep Learning System Testing. In ICSE.Google ScholarGoogle Scholar
  42. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS.Google ScholarGoogle Scholar
  43. David Goldberg. 1991. What Every Computer Scientist Should Know about Floating-Point Arithmetic. ACM Comput. Surv. (1991).Google ScholarGoogle Scholar
  44. Jesús M González-Barahona and Gregorio Robles. 2012. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. In ESE.Google ScholarGoogle Scholar
  45. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Divya Gopinath, Hayes Converse, Corina S. Păsăreanu, and Ankur Taly. 2019. Property Inference for Deep Neural Networks. In ASE.Google ScholarGoogle Scholar
  47. Hui Guan, Xipeng Shen, and Seung-Hwan Lim. 2019. Wootz: A Compiler-Based Framework for Fast CNN Pruning via Composability. In PLDI.Google ScholarGoogle Scholar
  48. Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, and Xiaohong Li. 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In ASE.Google ScholarGoogle Scholar
  49. Jeff Haochen and Suvrit Sra. 2019. Random Shuffling Beats SGD after Finite Epochs. In ICML.Google ScholarGoogle Scholar
  50. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV.Google ScholarGoogle Scholar
  51. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.Google ScholarGoogle Scholar
  52. Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In ESEC/FSE.Google ScholarGoogle Scholar
  53. Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep Reinforcement Learning that Matters. In AAAI.Google ScholarGoogle Scholar
  54. Ilija Ilievski, Taimoor Akhtar, Jiashi Feng, and Christine Annette Shoemaker. 2017. Efficient hyperparameter optimization for deep learning algorithms using deterministic rbf surrogates. In AAAI.Google ScholarGoogle Scholar
  55. Hadi Jooybar, Wilson WL Fung, Mike O'Connor, Joseph Devietti, and Tor M Aamodt. 2013. GPUDet: a deterministic GPU architecture. In ASPLOS.Google ScholarGoogle Scholar
  56. Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In ICSE.Google ScholarGoogle Scholar
  57. YK Kim and JB Ra. 1991. Weight value initialization for improving training speed in the backpropagation network. In IJCNN.Google ScholarGoogle Scholar
  58. Bobby Kleinberg, Yuanzhi Li, and Yang Yuan. 2018. An Alternative View: When Does SGD Escape Local Minima?. In ICML.Google ScholarGoogle Scholar
  59. Yuriy Kochura, Sergii Stirenko, Oleg Alienin, Michail Novotarskiy, and Yuri Gordienko. 2018. Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-threaded Modes. In CSIT.Google ScholarGoogle Scholar
  60. Vassili Kovalev, Alexander Kalinovsky, and Sergey Kovalev. 2016. Deep learning with theano, torch, caffe, tensorflow, and deeplearning4j: Which one is the best in speed and accuracy?Google ScholarGoogle Scholar
  61. Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images.Google ScholarGoogle Scholar
  62. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.Google ScholarGoogle Scholar
  63. Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. 2019. DIRE: A Neural Approach to Decompiled Identifier Naming. In ASE.Google ScholarGoogle Scholar
  64. Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, and Michael C. Mozer. 2019. State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations. In ICML.Google ScholarGoogle Scholar
  65. Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  66. Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2012. Efficient backprop. In Neural networks: Tricks of the trade.Google ScholarGoogle Scholar
  67. Jason D Lee, Max Simchowitz, Michael I Jordan, and Benjamin Recht. 2016. Gradient descent converges to minimizers. In arXiv preprint arXiv:1602.04915.Google ScholarGoogle Scholar
  68. Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. In NIPS.Google ScholarGoogle Scholar
  69. Yi Li, Wang Shaohua, and Tien N. Nguyen. 2020. DLFix: Context-based Code Transformation Learning for Automated Program Repair. In ICSE.Google ScholarGoogle Scholar
  70. Yi Li, Shaohua Wang, Tien N Nguyen, and Son Van Nguyen. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. In PACMPL.Google ScholarGoogle Scholar
  71. Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian Lü. 2019. Boosting Operational DNN Testing Efficiency through Conditioning. In ESEC/FSE.Google ScholarGoogle Scholar
  72. Ling Liu, Yanzhao Wu, Wenqi Wei, Wenqi Cao, Semih Sahin, and Qi Zhang. 2018. Benchmarking deep learning frameworks: Design considerations, metrics and beyond. In ICDCS.Google ScholarGoogle Scholar
  73. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems. In ASE.Google ScholarGoogle Scholar
  74. Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In ESEC/FSE.Google ScholarGoogle Scholar
  75. Zaheed Mahmood, David Bowes, Tracy Hall, Peter CR Lane, and Jean Petrić. 2018. Reproducibility and replicability of software defect prediction studies. ISE (2018).Google ScholarGoogle Scholar
  76. Andrii Maksai and Pascal Fua. 2019. Eliminating exposure bias and metric mismatch in multiple object tracking. In CVPR.Google ScholarGoogle Scholar
  77. Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C Daniel Freeman, and Jascha Sohl-Dickstein. 2019. Understanding and correcting pathologies in the training of learned optimizers. In ICML.Google ScholarGoogle Scholar
  78. Hrushikesh N. Mhaskar, Sergei V. Pereverzyev, and Maria D. van der Walt. 2017. A Deep Learning Approach to Diabetic Blood Glucose Prediction. In Front. Appl. Math. Stat.Google ScholarGoogle Scholar
  79. Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter Sweeney. 2009. Producing wrong data without doing anything obviously wrong!. In ACM SIGARCH Computer Architecture News.Google ScholarGoogle Scholar
  80. Prabhat Nagarajan, Garrett Warnell, and Peter Stone. 2019. Deterministic Implementations for Reproducibility in Deep Reinforcement Learning. In AAAI Workshop on Reproducible AI.Google ScholarGoogle Scholar
  81. Babatunde K Olorisade, Pearl Brereton, and Peter Andras. 2017. Reproducibility in machine Learning-Based studies: An example of text mining. In Reproducibility in Machine Learning Workshop, ICML.Google ScholarGoogle Scholar
  82. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NIPS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Brandon Paulsen, Jingbo Wang, and Chao Wang. 2020. ReluDiff: Differential Verification of Deep Neural Networks. In ICSE.Google ScholarGoogle Scholar
  84. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Commun. ACM.Google ScholarGoogle Scholar
  85. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries. In ICSE.Google ScholarGoogle Scholar
  86. Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2018. Benchmarking deep learning models on large healthcare datasets. In J. Biomed. Inform.Google ScholarGoogle ScholarCross RefCross Ref
  87. Subhajit Roy, Awanish Pandey, Brendan Dolan-Gavitt, and Yu Hu. 2018. Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults. In ESEC/FSE.Google ScholarGoogle Scholar
  88. Tim Salimans and Durk P Kingma. 2016. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In NIPS.Google ScholarGoogle Scholar
  89. Shlomo Sawilowsky, Jack Sawilowsky, and Robert J. Grissom. 2011. Effect Size.Google ScholarGoogle Scholar
  90. Andrew M. Saxe, James L. Mcclelland, and Surya Ganguli. 2014. Exact solutions to the nonlinear dynamics of learning in deep linear neural network. In ICLR.Google ScholarGoogle Scholar
  91. Frank Seide and Amit Agarwal. 2016. CNTK: Microsoft's Open-Source Deep-Learning Toolkit. In KDD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Shayan Shams, Richard Platania, Kisung Lee, and Seung-Jong Park. 2017. Evaluation of deep learning frameworks over different HPC architectures. In ICDCS.Google ScholarGoogle Scholar
  93. Ali Shatnawi, Ghadeer Al-Bdour, Raffi Al-Qurran, and Mahmoud Al-Ayyoub. 2018. A comparative study of open source deep learning frameworks. In ICICS.Google ScholarGoogle Scholar
  94. Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, and Xiangyang Xue. 2017. Dsod: Learning deeply supervised object detectors from scratch. In ICCV.Google ScholarGoogle Scholar
  95. Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. 2016. Benchmarking state-of-the-art deep learning software tools. In CCBD.Google ScholarGoogle Scholar
  96. Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. In Journal of Big Data.Google ScholarGoogle Scholar
  97. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In arXiv preprint arXiv:1409.1556.Google ScholarGoogle Scholar
  98. Xinhang Song, Luis Herranz, and Shuqiang Jiang. 2017. Depth cnns for rgb-d scene recognition: Learning from scratch better than transferring from rgb-cnns. In AAAI.Google ScholarGoogle Scholar
  99. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. In JMLR.Google ScholarGoogle Scholar
  100. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. In ASE.Google ScholarGoogle Scholar
  101. Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In ICML.Google ScholarGoogle Scholar
  102. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR.Google ScholarGoogle Scholar
  103. Nima Tajbakhsh, Jae Y Shin, Suryakanth R Gurudu, R Todd Hurst, Christopher B Kendall, Michael B Gotway, and Jianming Liang. 2016. Convolutional neural networks for medical image analysis: Full training or fine tuning?. In IEEE Trans. Med. Imag.Google ScholarGoogle ScholarCross RefCross Ref
  104. Kok Keong Teo, Lipo Wang, and Zhiping Lin. 2001. Wavelet packet multi-layer perceptron for chaotic time series prediction: effects of weight initialization. In ICCS.Google ScholarGoogle Scholar
  105. Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In ICSE.Google ScholarGoogle Scholar
  106. Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, and Baishakhi Ray. 2020. Testing DNN Image Classifier for Confusion & Bias Errors. In ICSE.Google ScholarGoogle Scholar
  107. Fabian Trautsch, Steffen Herbold, Philip Makedonski, and Jens Grabowski. 2018. Addressing problems with replicability and validity of repository mining studies through a smart data platform. In ESE.Google ScholarGoogle Scholar
  108. Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays. 2019. Composing text and image for image retrieval-an empirical odyssey. In CVPR.Google ScholarGoogle Scholar
  109. Huiyan Wang, Jingweiu Xu, Chang Xu, Xiaoxing Ma, and Jian Lu. 2020. DISSECTOR: Input Validation for Deep Learning Applications by Crossing-layer Dissection. In ICSE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. In ICSE.Google ScholarGoogle Scholar
  111. Steven R Young, Derek C Rose, Thomas P Karnowski, Seung-Hwan Lim, and Robert M Patton. 2015. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In MLHPC.Google ScholarGoogle Scholar
  112. Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In BMVC.Google ScholarGoogle Scholar
  113. Hao Zhang and W. K. Chan. 2019. Apricot: A Weight-Adaptation Approach to Fixing Deep Learning Models. In ASE.Google ScholarGoogle Scholar
  114. Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based Neural Source Code Summarization. In ICSE.Google ScholarGoogle Scholar
  115. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In ASE.Google ScholarGoogle Scholar
  116. Xiyue Zhang, Xiaofei Xie, Lei Ma, Xiaoning Du, Qiang Hu, Yang Liu, Jianjun Zhao, and Meng Sun. 2020. Towards Characterizing Adversarial Defects of Deep Learning Software from the Lens of Uncertainty. In ICSE.Google ScholarGoogle Scholar
  117. Gang Zhao and Jeff Huang. 2018. DeepSim: Deep Learning Code Functional Similarity. In ESEC/FSE.Google ScholarGoogle Scholar
  118. Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In CVPR.Google ScholarGoogle Scholar
  119. Yan Zheng, Xiaofei Xie, Ting Su, Lei Ma, Jianye Hao, Zhaopeng Meng, Yang Liu, Ruimin Shen, Yingfeng Chen, and Changjie Fan. 2019. Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning. In ASE.Google ScholarGoogle Scholar
  120. Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Lingming Zhang, Bei Yu, and Cong Liu. 2020. DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. In ICSE.Google ScholarGoogle Scholar
  121. H. Zhu, M. Akrout, B. Zheng, A. Pelegris, A. Jayarajan, A. Phanishayee, B. Schroeder, and G. Pekhimenko. 2018. Benchmarking and Analyzing Deep Neural Network Training. In IISWC.Google ScholarGoogle Scholar
  122. Rui Zhu, Shifeng Zhang, Xiaobo Wang, Longyin Wen, Hailin Shi, Liefeng Bo, and Tao Mei. 2019. ScratchDet: Training single-shot object detectors from scratch. In CVPR.Google ScholarGoogle Scholar

Index Terms

  1. Problems and opportunities in training deep learning software systems: an analysis of variance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
        December 2020
        1449 pages
        ISBN:9781450367684
        DOI:10.1145/3324884

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 January 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate82of337submissions,24%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader