ABSTRACT
Software fault prediction is an important and beneficial practice for improving software quality and reliability. The ability to predict which components in a large software system are most likely to contain the largest numbers of faults in the next release helps to better manage projects, including early estimation of possible release delays, and affordably guide corrective actions to improve the quality of the software. However, developing robust fault prediction models is a challenging task and many techniques have been proposed in the literature. Traditional software fault prediction studies mainly focus on manually designing features (e.g. complexity metrics), which are input into machine learning classifiers to identify defective code. However, these features often fail to capture the semantic and structural information of programs. Such information is needed for building accurate fault prediction models. In this survey, we discuss various approaches in fault prediction, also explaining how in recent studies deep learning algorithms for fault prediction help to bridge the gap between programs' semantics and fault prediction features and make accurate predictions.
- Sousuke Amasaki, Yasunari Takagi, Osamu Mizuno, and Tohru Kikuno. 2003. A Bayesian Belief Network for Assessing the Likelihood of Fault Content. In Proceedings of the 14th International Symposium on Software Reliability Engineering.Google ScholarDigital Library
- Afshine Amidi. 2018. cheatsheet-machine-learning-tips-and-tricks. https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learningtips-and-tricksGoogle Scholar
- Victor R. Basili, Lionel C. Briand, and Walcélio L. Melo. 1996. A Validation of Object-Oriented Design Metrics As Quality Indicators. IEEE Trans. Softw. Eng. (1996).Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. (2003).Google Scholar
- Lionel C. Briand, Jürgen Wüst, Stefan V. Ikonomovski, and Hakim Lounis. 1999. Investigating Quality Factors in Object-oriented Designs: An Industrial Case Study. In Proceedings of the 21st International Conference on Software Engineering.Google ScholarDigital Library
- Tse-Hsun Chen, Stephen W. Thomas, Meiyappan Nagappan, and Ahmed E. Hassan. 2012. Explaining Software Defects Using Topic Models. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories.Google Scholar
- S. R. Chidamber and C. F. Kemerer. 1994. A Metrics Suite for Object Oriented Design. IEEE Trans. Softw. Eng. (1994).Google Scholar
- Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy, Aditya Ghose, Taeksu Kim, and Chul-Joo Kim. 2019. Lessons Learned from Using a Deep Tree-Based Model for Software Defect Prediction in Practice. In Proceedings of the 16th International Conference on Mining Software Repositories.Google ScholarDigital Library
- Khanh Hoa Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy, Aditya K. Ghose, Taeksu Kim, and Chul-Joo Kim. 2018. A deep tree-based model for software defect prediction. ArXiv (2018).Google Scholar
- Marco D'Ambros, Michele Lanza, and Romain Robbes. 2012. Evaluating Defect Prediction Approaches: A Benchmark and an Extensive Comparison. Empirical Softw. Engg. (2012).Google Scholar
- Elhampaikari, Michael M.richter, and Guentherruhe. 2012. Defect prediction using case-based reasoning: an attribute weighting technique based upon sensitivity analysis in neural network. International Journal of Software Engineering and Knowledge Engineering (2012).Google Scholar
- Karim O. Elish and Mahmoud O. Elish. 2008. Predicting Defect-Prone Software Modules Using Support Vector Machines. J. Syst. Softw. (2008).Google Scholar
- N. Gayatri, Nickolas Savarimuthu, and A. Reddy. 2010. Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions. Lecture Notes in Engineering and Computer Science (2010).Google Scholar
- Andrew Habib and Michael Pradel. 2019. Neural Bug Finding: A Study of Opportunities and Challenges. CoRR (2019).Google Scholar
- Z. He, F. Peters, T. Menzies, and Y. Yang. 2013. Learning from Open-Source Projects: An Empirical Study on Defect Prediction. In ACM IEEE International Symposium on Empirical Software Engineering and Measurement.Google Scholar
- Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized Defect Prediction. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering.Google ScholarDigital Library
- Xiao-Yuan Jing, Shi Ying, Zhi-Wu Zhang, Shan-Shan Wu, and Jin Liu. 2014. Dictionary Learning Based Software Defect Prediction. In Proceedings of the 36th International Conference on Software Engineering.Google ScholarDigital Library
- Yasutaka Kamei, Takafumi Fukushima, Shane Mcintosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying Just-in-Time Defect Prediction Using Cross-Project Models. Empirical Softw. Engg. (2016).Google Scholar
- Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2013. A Large-Scale Empirical Study of Just-in-Time Quality Assurance. IEEE Trans. Softw. Eng. (2013).Google ScholarDigital Library
- Andrej Karpathy, Justin Johnson, and Fei Fei Li. 2015. Visualizing and Understanding Recurrent Networks. Cornell Univ. Lab. (2015).Google Scholar
- T. M. Khoshgoftaar, E. B. Allen, N. Goel, A. Nandi, and J. McMullan. 1996. Detection of Software Modules with High Debug Code Churn in a Very Large Legacy System. In Proceedings of the The Seventh International Symposium on Software Reliability Engineering.Google Scholar
- Taghi M. Khoshgoftaar and Naeem Seliya. 2002. Tree-Based Software Quality Estimation Models For Fault Prediction. In Proceedings of the 8th International Symposium on Software Metrics.Google Scholar
- Sunghun Kim, E. James Whitehead, and Yi Zhang. 2008. Classifying Software Changes: Clean or Buggy? IEEE Trans. Softw. Eng. (2008).Google Scholar
- Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. 2007. Predicting Faults from Cached History. In Proceedings of the 29th International Conference on Software Engineering.Google Scholar
- Barbara A. Kitchenham, Emilia Mendes, and Guilherme H. Travassos. 2007. Cross versus Within-Company Cost Estimation Studies: A Systematic Review. IEEE Trans. Softw. Eng. (2007).Google Scholar
- J. Li, P. He, J. Zhu, and M. R. Lyu. 2017. Software Defect Prediction via Convolutional Neural Network. In IEEE International Conference on Software Quality, Reliability and Security (QRS).Google Scholar
- C. Manjula and Lilly Florence. 2019. Deep neural network based hybrid approach for software defect prediction using software metrics. Cluster Computing (2019).Google Scholar
- Shane McIntosh and Yasutaka Kamei. 2018. Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-in-Time Defect Prediction. In Proceedings of the 40th International Conference on Software Engineering.Google ScholarDigital Library
- Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayse Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Autom. Softw. Eng. (2010).Google Scholar
- A. Mockus and D. M. Weiss. 2000. Predicting risk of software changes. Bell Labs Technical Journal (2000).Google Scholar
- Raimund Moser, Witold Pedrycz, and Giancarlo Succi. 2008. A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction. In Proceedings of the 30th International Conference on Software Engineering.Google ScholarDigital Library
- Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.Google ScholarDigital Library
- Nachiappan Nagappan and Thomas Ball. 2005. Use of Relative Code Churn Measures to Predict System Defect Density. In Proceedings of the 27th International Conference on Software Engineering.Google ScholarDigital Library
- Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining Metrics to Predict Component Failures. In Proceedings of the 28th International Conference on Software Engineering.Google ScholarDigital Library
- Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer Defect Learning. In Proceedings of the International Conference on Software Engineering.Google ScholarDigital Library
- Tung Thanh Nguyen, Tien N. Nguyen, and Tu Minh Phuong. 2011. Topic-Based Defect Prediction (NIER Track). In Proceedings of the 33rd International Conference on Software Engineering.Google ScholarDigital Library
- S. Omri, P. Montag, and C. Sinz. 2018. Static Analysis and Code Complexity Metrics as Early Indicators of Software Defects. Journal of Software Engineering and Applications (2018).Google Scholar
- S. Omri, C. Sinz, and P. Montag. [n.d.]. An Enhanced Fault Prediction Model for Embedded Software based on Code Churn, Complexity Metrics, and Static Analysis Results. ICSEA 2019: The Fourteenth International Conference on Software Engineering Advances.Google Scholar
- Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
- Anh Phan, Le Nguyen, and Lam Bui. 2018. Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction. (2018).Google Scholar
- Lutz Prechelt and Alexander Pepper. 2014. Why Software Repositories Are Not Used for Defect-Insertion Circumstance Analysis More Often: A Case Study. Inf. Softw. Technol. (2014).Google Scholar
- Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. 2017. Learning to Generate Reviews and Discovering Sentiment. (2017).Google Scholar
- R. Rana, M. Staron, J. Hansson, and M. Nilsson. 2014. Defect prediction over software life cycle in automotive domain state of the art and road map for future. In 9th International Conference on Software Engineering and Applications (ICSOFT-EA).Google Scholar
- Ramanath Subramanyam and M. S. Krishnan. 2003. Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects. IEEE Trans. Softw. Eng. (2003).Google Scholar
- Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online Defect Prediction for Imbalanced Data. In Proceedings of the 37th International Conference on Software Engineering.Google ScholarCross Ref
- Mei-Huei Tang, Ming-Hung Kao, and Mei-Hwa Chen. 1999. An Empirical Study on Object-Oriented Metrics. In Proceedings of the 6th International Symposium on Software Metrics.Google ScholarDigital Library
- Haonan Tong, Bin Liu, and Shihai Wang. 2017. Software Defect Prediction Using Stacked Denoising Autoencoders and Two-stage Ensemble Learning. Information and Software Technology (2017).Google Scholar
- Burak Turhan, Tim Menzies, Ayundefinede B. Bener, and Justin Di Stefano. 2009. On the Relative Value of Cross-Company and within-Company Data for Defect Prediction. Empirical Softw. Engg. (2009).Google Scholar
- Jun Wang, Beijun Shen, and Yuting Chen. [n.d.]. Compressed C4.5 Models for Software Defect Prediction. In Proceedings of the 2012, 12th International Conference on Quality Software.Google Scholar
- Jinyong Wang and Ce Zhang. 2018. Software reliability prediction using a deep learning model based on the RNN encoder-decoder. Reliab. Eng. Syst. Saf. (2018).Google Scholar
- S. Wang, T. Liu, J. Nam, and L. Tan. 2018. Deep Semantic Feature Learning for Software Defect Prediction. IEEE Transactions on Software Engineering (2018).Google Scholar
- Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the 38th International Conference on Software Engineering.Google ScholarDigital Library
- T. Wang and W. Li. [n.d.]. Naive Bayes Software Defect Prediction Model. In 2010 International Conference on Computational Intelligence and Software Engineering.Google Scholar
- X. Xia, D. Lo, X. Wang, and X. Yang. 2016. Collective Personalized Change Classification With Multiobjective Search. IEEE Transactions on Reliability (2016).Google Scholar
- Xihao Xie, Wen Zhang, Ye Yang, and Qing Wang. 2012. DRETOM: Developer Recommendation Based on Topic Models for Bug Resolution. In Proceedings of the 8th International Conference on Predictive Models in Software Engineering.Google ScholarDigital Library
- Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep Learning for Just-in-Time Defect Prediction. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security.Google ScholarDigital Library
- Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-Project Defect Prediction: A Large Scale Experiment on Data vs. Domain vs. Process. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT.Google ScholarDigital Library
Index Terms
- Deep Learning for Software Defect Prediction: A Survey
Recommendations
Progress on approaches to software defect prediction
Software defect prediction is one of the most popular research topics in software engineering. It aims to predict defect‐prone software modules before defects are discovered, therefore it can be used to better prioritise software quality assurance effort. ...
Transfer learning for cross-company software defect prediction
Context: Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data ...
Defect prediction model using transfer learning
AbstractSoftware defect prediction (SDP) plays an important role in new research areas of software engineering. Cross-project defect prediction (CPDP) technique achieved success for prediction of defects in innovating projects having lack of data. In this ...
Comments