Abstract
Nowadays, we are witnessing an increasing demand in both corporates and academia for exploiting Deep Learning (DL) to solve complex real-world problems. A DL program encodes the network structure of a desirable DL model and the process by which the model learns from the training dataset. Like any software, a DL program can be faulty, which implies substantial challenges of software quality assurance, especially in safety-critical domains. It is therefore crucial to equip DL development teams with efficient fault detection techniques and tools. In this article, we propose NeuraLint, a model-based fault detection approach for DL programs, using meta-modeling and graph transformations. First, we design a meta-model for DL programs that includes their base skeleton and fundamental properties. Then, we construct a graph-based verification process that covers 23 rules defined on top of the meta-model and implemented as graph transformations to detect faults and design inefficiencies in the generated models (i.e., instances of the meta-model). First, the proposed approach is evaluated by finding faults and design inefficiencies in 28 synthesized examples built from common problems reported in the literature. Then NeuraLint successfully finds 64 faults and design inefficiencies in 34 real-world DL programs extracted from Stack Overflow posts and GitHub repositories. The results show that NeuraLint effectively detects faults and design issues in both synthesized and real-world examples with a recall of 70.5% and a precision of 100%. Although the proposed meta-model is designed for feedforward neural networks, it can be extended to support other neural network architectures such as recurrent neural networks. Researchers can also expand our set of verification rules to cover more types of issues in DL programs.
- 2016. Retrieved February 2, 2021 from https://github.com/katyprogrammer/regularization-experiment/commit/b93dd636.Google Scholar
- 2017. Retrieved February 5, 2021 from https://github.com/dishen12/keras_frcnn/commit/38413c6.Google Scholar
- 2017. Retrieved February 5, 2021 from https://github.com/yumatsuoka/comp_DNNfw/commit/30e0973.Google Scholar
- 2018. Retrieved February 5, 2021 from https://github.com/taashi-s/UNet_Keras/commit/b1b6d93.Google Scholar
- 2018. Retrieved February 5, 2021 from https://github.com/keras-team/keras-applications/commit/05ff470.Google Scholar
- 2018. Retrieved February 5, 2021 from https://github.com/mateusz93/Car-recognition/commit/94b36ea.Google Scholar
- 2018. Retrieved February 5, 2021 from https://github.com/tf-encrypted/tf-encrypted/issues/248 (network B).Google Scholar
- 2018. Retrieved February 5, 2021 from https://github.com/tf-encrypted/tf-encrypted/issues/248 (network C).Google Scholar
- 2020. Keras-based VGG16. Retrieved June 29, 2020 https://github.com/keras-team/keras-applications/blob/master/keras_applications/vgg16.py.Google Scholar
- 2020. Tensorflow-based LeNet. Retrieved June 29, 2020 https://github.com/tensorflow/models/blob/master/research/slim/nets/lenet.py.Google Scholar
- 2021. Replication Package and Source Code of NeuraLint. Retrieved February 15, 2021 https://github.com/neuralint/neuralint.Google Scholar
- Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In International Conference on Learning Representations. Retrieved June 29, 2020 https://openreview.net/forum?id=BJOFETxR-.Google Scholar
- Houssem Ben Braiek and Foutse Khomh. 2019. DeepEvolution: A search-based testing approach for deep neural networks. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME'19). IEEE, 454–458.Google ScholarCross Ref
- Houssem Ben Braiek and Foutse Khomh. 2019. TFCheck: A tensorflow library for detecting training issues in neural network programs. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS'19). IEEE, 426–433.Google ScholarCross Ref
- Houssem Ben Braiek and Foutse Khomh. 2020. On testing machine learning programs. Journal of Systems and Software 164 (2020), 110542.Google ScholarCross Ref
- Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An analysis of deep neural network models for practical applications. arXiv:1605.07678. https://arxiv.org/abs/1605.07678.Google Scholar
- Selim Ciraci, Pim van den Broek, and Mehmet Aksit. 2010. Graph-based verification of static program constraints. In Proceedings of the 2010 ACM Symposium on Applied Computing. 2265–2272. Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google ScholarCross Ref
- Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. 2020. Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations. Retrieved June 29, 2020 https://openreview.net/forum.Google Scholar
- Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, R. P. Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 118–128. Google ScholarDigital Library
- Amir Hossein Ghamarian, Maarten de Mol, Arend Rensink, Eduardo Zambon, and Maria Zimakova. 2012. Modelling and analysis using GROOVE. International Journal on Software Tools for Technology Transfer 14, 1 (2012), 15–40.Google ScholarCross Ref
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249–256.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org. Google ScholarDigital Library
- Dongyoon Han, Jiwhan Kim, and Junmo Kim. 2017. Deep pyramidal residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5927–5935.Google ScholarCross Ref
- Thomas Hartmann, Assaad Moawad, Cedric Schockaert, Francois Fouquet, and Yves Le Traon. 2019. Meta-modelling meta-learning. In 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS'19). IEEE, 300–305.Google ScholarCross Ref
- Seyyed Hossein Hasanpour, Mohammad Rouhani, Mohsen Fayyaz, Mohammad Sabokrou, and Ehsan Adeli. 2018. Towards principled design of deep convolutional networks: Introducing SimpNet. arXiv:1802.06205. https://arxiv.org/abs/1802.06205.Google Scholar
- Soufiane Hayou, Arnaud Doucet, and Judith Rousseau. 2019. On the impact of the activation function on deep neural networks training. In International Conference on Machine Learning. Proceedings of Machine Learning Research, 2672–2680.Google Scholar
- Kaiming He and Jian Sun. 2015. Convolutional neural networks at constrained time cost. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5353–5360.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- Reiko Heckel. 2006. Graph transformation in a nutshell. Electronic Notes in Theoretical Computer Science 148, 1 (2006), 187–198. Google ScholarDigital Library
- Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1110–1121. Google ScholarDigital Library
- Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer. 2014. DenseNet: Implementing efficient ConvNet descriptor pyramids. arXiv:1404.1869. https://arxiv.org/abs/1404.1869.Google Scholar
- Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50 fewer parameters and <0.5 MB model size. arXiv:1602.07360. https://arxiv.org/abs/1602.07360.Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. Proceedings of Machine Learning Research, 448–456. Google ScholarDigital Library
- Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510–520. Google ScholarDigital Library
- Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing Deep neural networks: Fix patterns and challenges. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE'20). 11351146. Google ScholarDigital Library
- Roshni G. Iyer, Yizhou Sun, Wei Wang, and Justin Gottschlich. 2020. Software language comprehension using a program-derived semantic graph. arXiv:2004.00768. https://arxiv.org/abs/2004.00768.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105. Google ScholarDigital Library
- Yann LeCun. 1998. The MNIST database of handwritten digits. Retrieved June 29, 2020 http://yann.lecun.com/exdb/mnist/.Google Scholar
- Yann LeCun et al. 2015. LeNet-5, convolutional neural networks. Retrieved June 29, 2020 http://yann.lecun.com/exdb/lenet.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
- Xiang Li, Shuo Chen, Xiaolin Hu, and Jian Yang. 2019. Understanding the disharmony between dropout and batch normalization by variance shift. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2682–2690.Google ScholarCross Ref
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, 740–755.Google Scholar
- Dmytro Mishkin, Nikolay Sergievskiy, and Jiri Matas. 2017. Systematic evaluation of convolution neural network advances on the ImageNet. Computer Vision and Image Understanding 161 (2017), 11–19. Google ScholarDigital Library
- Chigozie Nwankpa, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall. 2018. Activation functions: Comparison of trends in practice and research for deep learning. arXiv:1811.03378. https://arxiv.org/abs/1811.03378.Google Scholar
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1–18. Google ScholarDigital Library
- Carlos E. Perez. 2016. The Meta Model and Meta Meta-Model of Deep Learning. Retrieved June 29, 2020 https://medium.com/intuitionmachine/the-meta-model-and-meta-meta-model-of-deep-learning-10062f0bf74c.Google Scholar
- Arend Rensink. 2003. The GROOVE simulator: A tool for state space generation. In International Workshop on Applications of Graph Transformations with Industrial Relevance. Springer, 479–485.Google Scholar
- Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In International Conference on Artificial Neural Networks. Springer, 92–101. Google ScholarDigital Library
- Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing bug-free machine learning systems with formal mathematics. In Proceedings of the 34th International Conference on Machine Learning–Volume 70. JMLR.org, 3047–3056. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. https://arxiv.org/abs/1409.1556.Google Scholar
- Leslie N. Smith and Nicholay Topin. 2016. Deep convolutional neural network design patterns. arXiv:1611.00847. https://arxiv.org/abs/1611.00847.Google Scholar
- Richard Mark Soley. 2013. How to Deliver Resilient, Secure, Efficient, and Easily Changed IT Systems in Line with CISQ Recommendations. Retrieved June 29, 2020 https://www.omg.org/news/whitepapers/CISQ_compliant_IT_Systemsv.4-3.pdf.Google Scholar
- Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. arXiv:1412.6806. https://arxiv.org/abs/1412.6806.Google Scholar
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929–1958. Google ScholarDigital Library
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.Google ScholarCross Ref
- Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303–314. Google ScholarDigital Library
- Chengwei Zhang. 2018. How to Choose Last-Layer Activation and Loss Function. Retrieved June 29, 2020 https://www.dlology.com/blog/how-to-choose-last-layer-activation-and-loss-function/.Google Scholar
- Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, and Miryung Kim. 2019. An empirical study of common challenges in developing deep learning applications. In The 30th IEEE International Symposium on Software Reliability Engineering (ISSRE'19).Google ScholarCross Ref
- Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 129–140. Google ScholarDigital Library
Index Terms
- Automatic Fault Detection for Deep Learning Programs Using Graph Transformations
Recommendations
Realizing UML Metamodel Transformations with AGG
In this paper, we work out equivalence transformations on the UML metamodel as concrete graph transformations implemented in the AGG tool. We consider two examples for manipulating the static structure of a UML model, namely the transformation of an ...
Euler Graph Transformations for Euler Diagram Layout
VLHCC '10: Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric ComputingEuler diagrams are frequently used for visualizing information about collections of objects and form an important component of various visual languages. Properties possessed by Euler diagrams correlate with their usability, such as whether the diagram ...
Faults in deep reinforcement learning programs: a taxonomy and a detection approach
AbstractA growing demand is witnessed in both industry and academia for employing Deep Learning (DL) in various domains to solve real-world problems. Deep reinforcement learning (DRL) is the application of DL in the domain of Reinforcement Learning. Like ...
Comments