ABSTRACT
Deep learning (DL) training algorithms utilize nondeterminism to improve models' accuracy and training efficiency. Hence, multiple identical training runs (e.g., identical training data, algorithm, and network) produce different models with different accuracies and training times. In addition to these algorithmic factors, DL libraries (e.g., TensorFlow and cuDNN) introduce additional variance (referred to as implementation-level variance) due to parallelism, optimization, and floating-point computation.
This work is the first to study the variance of DL systems and the awareness of this variance among researchers and practitioners. Our experiments on three datasets with six popular networks show large overall accuracy differences among identical training runs. Even after excluding weak models, the accuracy difference is 10.8%. In addition, implementation-level factors alone cause the accuracy difference across identical training runs to be up to 2.9%, the per-class accuracy difference to be up to 52.4%, and the training time difference to be up to 145.3%. All core libraries (TensorFlow, CNTK, and Theano) and low-level libraries (e.g., cuDNN) exhibit implementation-level variance across all evaluated versions.
Our researcher and practitioner survey shows that 83.8% of the 901 participants are unaware of or unsure about any implementation-level variance. In addition, our literature survey shows that only 19.5±3% of papers in recent top software engineering (SE), artificial intelligence (AI), and systems conferences use multiple identical training runs to quantify the variance of their DL approaches. This paper raises awareness of DL variance and directs SE researchers to challenging tasks such as creating deterministic DL implementations to facilitate debugging and improving the reproducibility of DL software and results.
- 2019. ASE '19: Proceedings of the 34th ACM/IEEE International Conference on Automated Software Engineering.Google Scholar
- 2019. ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.Google Scholar
- 2019. CVPR'19: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- 2019. ESEC/FSE '19: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google Scholar
- 2019. ICCV'19: Proceedings of The IEEE International Conference on Computer Vision. IEEE Press.Google Scholar
- 2019. ICML'19: Proceedings of The International Conference on Machine Learning.Google Scholar
- 2019. ICSE '19: Proceedings of the 41st International Conference on Software Engineering. IEEE Press.Google Scholar
- 2019. MLSys'19: Proceedings of Machine Learning and Systems.Google Scholar
- 2019. NIPS'19: Proceedings of the 33rd Conference on Neural Information Processing Systems.Google Scholar
- 2019. SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles.Google Scholar
- 2020. ICLR'20: Proceedings of The International Conference on Learning Representations.Google Scholar
- 2020. Reproducibility Challenge @ NeurIPS 2019. https://reproducibilitychallenge.github.io/neurips2019/Google Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: a System for Large-Scale Machine Learning. In OSDI.Google Scholar
- Charu C Aggarwal. 2018. Neural networks and deep learning. In Springer.Google Scholar
- Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In ICSE.Google Scholar
- James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeron, et al. 2011. Theano: Deep learning on GPUs with Python. In NIPS.Google Scholar
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. In JMLR.Google Scholar
- Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang. 2019. AutoFocus: Interpreting Attention-Based Neural Networks by Code Perturbation. In ASE.Google Scholar
- Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning Affordance for Direct Perception in Autonomous Driving. In ICCV.Google Scholar
- Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhu, Guoqiang Li, and Jinshui Wang. 2020. Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning. In ICSE.Google Scholar
- Jinyin Chen, Keke Hu, Yue Yu, Zhuangzhi Chen, Qi Xuan, Yi Liu, and Vladimir Filkov. 2020. Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction. In ICSE.Google Scholar
- Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In NIPS.Google Scholar
- Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning.Google Scholar
- François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In CVPR.Google Scholar
- François Chollet et al. 2015. Keras. https://keras.io.Google Scholar
- Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. 2015. The loss surfaces of multilayer networks. In AISTATS.Google Scholar
- Anna Choromanska, Yann LeCun, and Gérard Ben Arous. 2015. Open Problem: The landscape of the loss surfaces of multilayer networks. In COLR.Google Scholar
- J. Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences.Google Scholar
- Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. 2018. How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments. In CoRR.Google Scholar
- Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. In Training.Google Scholar
- Standard Performance Evaluation Corporation. 2006. SPEC CPU2006 Benchmarks. http://www.spec.org/cpu2006/.Google Scholar
- Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang, Garth A. Gibson, and Randal E. Bryant. 2013. Parrot: a Practical Runtime for Deterministic, Stable, and Reliable Threads. In SOSP.Google Scholar
- Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In NIPS.Google Scholar
- Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A. Smith. 2019. Show Your Work: Improved Reporting of Experimental Results. In EMNLP-IJCNLP.Google Scholar
- Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. 2018. Essentially No Barriers in Neural Network Energy Landscape. In ICML.Google Scholar
- Simon S Du, Jason D Lee, Yuandong Tian, Barnabas Poczos, and Aarti Singh. 2017. Gradient descent learns one-hidden-layer cnn: Don't be afraid of spurious local minima. In arXiv preprint arXiv:1712.00779.Google Scholar
- Brian Everitt. 2002. The Cambridge dictionary of statistics.Google Scholar
- Hao-Shu Fang, Guansong Lu, Xiaolin Fang, Jianwen Xie, Yu-Wing Tai, and Cewu Lu. 2018. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer. In CVPR.Google Scholar
- Xiang Gao, Ripon Saha, Mukul Prasad, and Abhik Roychoudhury. 2020. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In ICSE.Google Scholar
- Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. 2015. Escaping from saddle points---online stochastic gradient for tensor decomposition. In COLT.Google Scholar
- Simos Gerasimou, Hasan Ferit-Eniser, Alper Sen, and Alper Çakan. 2020. Importance-Driven Deep Learning System Testing. In ICSE.Google Scholar
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS.Google Scholar
- David Goldberg. 1991. What Every Computer Scientist Should Know about Floating-Point Arithmetic. ACM Comput. Surv. (1991).Google Scholar
- Jesús M González-Barahona and Gregorio Robles. 2012. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. In ESE.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.Google ScholarDigital Library
- Divya Gopinath, Hayes Converse, Corina S. Păsăreanu, and Ankur Taly. 2019. Property Inference for Deep Neural Networks. In ASE.Google Scholar
- Hui Guan, Xipeng Shen, and Seung-Hwan Lim. 2019. Wootz: A Compiler-Based Framework for Fast CNN Pruning via Composability. In PLDI.Google Scholar
- Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, and Xiaohong Li. 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In ASE.Google Scholar
- Jeff Haochen and Suvrit Sra. 2019. Random Shuffling Beats SGD after Finite Epochs. In ICML.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.Google Scholar
- Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In ESEC/FSE.Google Scholar
- Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep Reinforcement Learning that Matters. In AAAI.Google Scholar
- Ilija Ilievski, Taimoor Akhtar, Jiashi Feng, and Christine Annette Shoemaker. 2017. Efficient hyperparameter optimization for deep learning algorithms using deterministic rbf surrogates. In AAAI.Google Scholar
- Hadi Jooybar, Wilson WL Fung, Mike O'Connor, Joseph Devietti, and Tor M Aamodt. 2013. GPUDet: a deterministic GPU architecture. In ASPLOS.Google Scholar
- Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In ICSE.Google Scholar
- YK Kim and JB Ra. 1991. Weight value initialization for improving training speed in the backpropagation network. In IJCNN.Google Scholar
- Bobby Kleinberg, Yuanzhi Li, and Yang Yuan. 2018. An Alternative View: When Does SGD Escape Local Minima?. In ICML.Google Scholar
- Yuriy Kochura, Sergii Stirenko, Oleg Alienin, Michail Novotarskiy, and Yuri Gordienko. 2018. Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-threaded Modes. In CSIT.Google Scholar
- Vassili Kovalev, Alexander Kalinovsky, and Sergey Kovalev. 2016. Deep learning with theano, torch, caffe, tensorflow, and deeplearning4j: Which one is the best in speed and accuracy?Google Scholar
- Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.Google Scholar
- Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. 2019. DIRE: A Neural Approach to Decompiled Identifier Naming. In ASE.Google Scholar
- Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, and Michael C. Mozer. 2019. State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations. In ICML.Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE.Google ScholarCross Ref
- Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2012. Efficient backprop. In Neural networks: Tricks of the trade.Google Scholar
- Jason D Lee, Max Simchowitz, Michael I Jordan, and Benjamin Recht. 2016. Gradient descent converges to minimizers. In arXiv preprint arXiv:1602.04915.Google Scholar
- Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. In NIPS.Google Scholar
- Yi Li, Wang Shaohua, and Tien N. Nguyen. 2020. DLFix: Context-based Code Transformation Learning for Automated Program Repair. In ICSE.Google Scholar
- Yi Li, Shaohua Wang, Tien N Nguyen, and Son Van Nguyen. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. In PACMPL.Google Scholar
- Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian Lü. 2019. Boosting Operational DNN Testing Efficiency through Conditioning. In ESEC/FSE.Google Scholar
- Ling Liu, Yanzhao Wu, Wenqi Wei, Wenqi Cao, Semih Sahin, and Qi Zhang. 2018. Benchmarking deep learning frameworks: Design considerations, metrics and beyond. In ICDCS.Google Scholar
- Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems. In ASE.Google Scholar
- Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In ESEC/FSE.Google Scholar
- Zaheed Mahmood, David Bowes, Tracy Hall, Peter CR Lane, and Jean Petrić. 2018. Reproducibility and replicability of software defect prediction studies. ISE (2018).Google Scholar
- Andrii Maksai and Pascal Fua. 2019. Eliminating exposure bias and metric mismatch in multiple object tracking. In CVPR.Google Scholar
- Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C Daniel Freeman, and Jascha Sohl-Dickstein. 2019. Understanding and correcting pathologies in the training of learned optimizers. In ICML.Google Scholar
- Hrushikesh N. Mhaskar, Sergei V. Pereverzyev, and Maria D. van der Walt. 2017. A Deep Learning Approach to Diabetic Blood Glucose Prediction. In Front. Appl. Math. Stat.Google Scholar
- Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter Sweeney. 2009. Producing wrong data without doing anything obviously wrong!. In ACM SIGARCH Computer Architecture News.Google Scholar
- Prabhat Nagarajan, Garrett Warnell, and Peter Stone. 2019. Deterministic Implementations for Reproducibility in Deep Reinforcement Learning. In AAAI Workshop on Reproducible AI.Google Scholar
- Babatunde K Olorisade, Pearl Brereton, and Peter Andras. 2017. Reproducibility in machine Learning-Based studies: An example of text mining. In Reproducibility in Machine Learning Workshop, ICML.Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NIPS.Google ScholarDigital Library
- Brandon Paulsen, Jingbo Wang, and Chao Wang. 2020. ReluDiff: Differential Verification of Deep Neural Networks. In ICSE.Google Scholar
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Commun. ACM.Google Scholar
- Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries. In ICSE.Google Scholar
- Sanjay Purushotham, Chuizheng Meng, Zhengping Che, and Yan Liu. 2018. Benchmarking deep learning models on large healthcare datasets. In J. Biomed. Inform.Google ScholarCross Ref
- Subhajit Roy, Awanish Pandey, Brendan Dolan-Gavitt, and Yu Hu. 2018. Bug Synthesis: Challenging Bug-Finding Tools with Deep Faults. In ESEC/FSE.Google Scholar
- Tim Salimans and Durk P Kingma. 2016. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In NIPS.Google Scholar
- Shlomo Sawilowsky, Jack Sawilowsky, and Robert J. Grissom. 2011. Effect Size.Google Scholar
- Andrew M. Saxe, James L. Mcclelland, and Surya Ganguli. 2014. Exact solutions to the nonlinear dynamics of learning in deep linear neural network. In ICLR.Google Scholar
- Frank Seide and Amit Agarwal. 2016. CNTK: Microsoft's Open-Source Deep-Learning Toolkit. In KDD.Google ScholarDigital Library
- Shayan Shams, Richard Platania, Kisung Lee, and Seung-Jong Park. 2017. Evaluation of deep learning frameworks over different HPC architectures. In ICDCS.Google Scholar
- Ali Shatnawi, Ghadeer Al-Bdour, Raffi Al-Qurran, and Mahmoud Al-Ayyoub. 2018. A comparative study of open source deep learning frameworks. In ICICS.Google Scholar
- Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, and Xiangyang Xue. 2017. Dsod: Learning deeply supervised object detectors from scratch. In ICCV.Google Scholar
- Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. 2016. Benchmarking state-of-the-art deep learning software tools. In CCBD.Google Scholar
- Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. In Journal of Big Data.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In arXiv preprint arXiv:1409.1556.Google Scholar
- Xinhang Song, Luis Herranz, and Shuqiang Jiang. 2017. Depth cnns for rgb-d scene recognition: Learning from scratch better than transferring from rgb-cnns. In AAAI.Google Scholar
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. In JMLR.Google Scholar
- Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. In ASE.Google Scholar
- Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In ICML.Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR.Google Scholar
- Nima Tajbakhsh, Jae Y Shin, Suryakanth R Gurudu, R Todd Hurst, Christopher B Kendall, Michael B Gotway, and Jianming Liang. 2016. Convolutional neural networks for medical image analysis: Full training or fine tuning?. In IEEE Trans. Med. Imag.Google ScholarCross Ref
- Kok Keong Teo, Lipo Wang, and Zhiping Lin. 2001. Wavelet packet multi-layer perceptron for chaotic time series prediction: effects of weight initialization. In ICCS.Google Scholar
- Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In ICSE.Google Scholar
- Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, and Baishakhi Ray. 2020. Testing DNN Image Classifier for Confusion & Bias Errors. In ICSE.Google Scholar
- Fabian Trautsch, Steffen Herbold, Philip Makedonski, and Jens Grabowski. 2018. Addressing problems with replicability and validity of repository mining studies through a smart data platform. In ESE.Google Scholar
- Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays. 2019. Composing text and image for image retrieval-an empirical odyssey. In CVPR.Google Scholar
- Huiyan Wang, Jingweiu Xu, Chang Xu, Xiaoxing Ma, and Jian Lu. 2020. DISSECTOR: Input Validation for Deep Learning Applications by Crossing-layer Dissection. In ICSE.Google ScholarDigital Library
- Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. In ICSE.Google Scholar
- Steven R Young, Derek C Rose, Thomas P Karnowski, Seung-Hwan Lim, and Robert M Patton. 2015. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In MLHPC.Google Scholar
- Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In BMVC.Google Scholar
- Hao Zhang and W. K. Chan. 2019. Apricot: A Weight-Adaptation Approach to Fixing Deep Learning Models. In ASE.Google Scholar
- Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based Neural Source Code Summarization. In ICSE.Google Scholar
- Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In ASE.Google Scholar
- Xiyue Zhang, Xiaofei Xie, Lei Ma, Xiaoning Du, Qiang Hu, Yang Liu, Jianjun Zhao, and Meng Sun. 2020. Towards Characterizing Adversarial Defects of Deep Learning Software from the Lens of Uncertainty. In ICSE.Google Scholar
- Gang Zhao and Jeff Huang. 2018. DeepSim: Deep Learning Code Functional Similarity. In ESEC/FSE.Google Scholar
- Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In CVPR.Google Scholar
- Yan Zheng, Xiaofei Xie, Ting Su, Lei Ma, Jianye Hao, Zhaopeng Meng, Yang Liu, Ruimin Shen, Yingfeng Chen, and Changjie Fan. 2019. Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning. In ASE.Google Scholar
- Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Lingming Zhang, Bei Yu, and Cong Liu. 2020. DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. In ICSE.Google Scholar
- H. Zhu, M. Akrout, B. Zheng, A. Pelegris, A. Jayarajan, A. Phanishayee, B. Schroeder, and G. Pekhimenko. 2018. Benchmarking and Analyzing Deep Neural Network Training. In IISWC.Google Scholar
- Rui Zhu, Shifeng Zhang, Xiaobo Wang, Longyin Wen, Hailin Shi, Liefeng Bo, and Tao Mei. 2019. ScratchDet: Training single-shot object detectors from scratch. In CVPR.Google Scholar
Index Terms
- Problems and opportunities in training deep learning software systems: an analysis of variance
Recommendations
Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods
AL methods produce smoother Intra-labeler learning curves during the training phase.AL methods result in almost half of the mean Inter-labeler AUC standard deviation.The consensus label resulted in an AUC that was at least as high as that of the gold ...
DEVIATE: a <u>de</u>ep learning <u>v</u>ar<u>ia</u>nce <u>te</u>sting framework
ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software EngineeringDeep learning (DL) training is nondeterministic and such nondeterminism was shown to cause significant variance of model accuracy (up to 10.8%). Such variance may affect the validity of the comparison of newly proposed DL techniques with baselines. To ...
Improve Deep Learning with Unsupervised Objective
Neural Information ProcessingAbstractWe propose a novel approach capable of embedding the unsupervised objective into hidden layers of the deep neural network (DNN) for preserving important unsupervised information. To this end, we exploit a very simple yet effective unsupervised ...
Comments