Skip to main content
Top
Published in: Automated Software Engineering 1/2015

01-03-2015

AutoODC: Automated generation of orthogonal defect classifications

Authors: LiGuo Huang, Vincent Ng, Isaac Persing, Mingrui Chen, Zeheng Li, Ruili Geng, Jeff Tian

Published in: Automated Software Engineering | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Orthogonal defect classification (ODC), the most influential framework for software defect classification and analysis, provides valuable in-process feedback to system development and maintenance. Conducting ODC classification on existing organizational defect reports is human-intensive and requires experts’ knowledge of both ODC and system domains. This paper presents AutoODC, an approach for automating ODC classification by casting it as a supervised text classification problem. Rather than merely applying the standard machine learning framework to this task, we seek to acquire a better ODC classification system by integrating experts’ ODC experience and domain knowledge into the learning process via proposing a novel relevance annotation framework. We have trained AutoODC using two state-of-the-art machine learning algorithms for text classification, Naive Bayes (NB) and support vector machine (SVM), and evaluated it on both an industrial defect report from the social network domain and a larger defect list extracted from a publicly accessible defect tracker of the open source system FileZilla. AutoODC is a promising approach: not only does it leverage minimal human effort beyond the human annotations typically required by standard machine learning approaches, but it achieves overall accuracies of 82.9 % (NB) and 80.7 % (SVM) on the industrial defect report, and accuracies of 77.5 % (NB) and 75.2 % (SVM) on the larger, more diversified open source defect list.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Due to proprietary rules, we anonymize the industrial company by referring to it as “Company P” throughout this paper.
 
2
The definitions and taxonomy of ODC v5.2 attributes are accessible at http://​researcher.​watson.​ibm.​com/​researcher/​files/​us-pasanth/​ODC-5-2.​pdf.
 
3
Elgg is an open source social networking engine. The defect (issue) tracker of Elgg can be accessed at https://​github.​com/​Elgg/​Elgg/​issues.
 
4
Other stemmers, such as the Porter stemmer (Porter 1980), can be used, but we found that the WordNet stemmer yields slightly better accuracy.
 
5
FileZilla is a free FTP solution composed of three subsystems: FileZilla Client, FileZilla Server, and Other. The defect tracker for the three subsystems of FileZilla is accessible at http://​trac.​filezilla-project.​org/​wiki/​Queries.
 
6
To train a multi-class SVM classifier, we use \(SVM^{multiclass}\) (Tsochantaridis et al. 2004). To train a multi-class NB classifier, we use the implementation in Weka.
 
Literature
go back to reference Ahsan, S.N., Ferzund, J., Wotawa, F.: Automatic classification of software change request using multi-label machine learning methods. In: Proceedings of the 33rd IEEE Software Engineering, Workshop, pp. 79–86 (2009) Ahsan, S.N., Ferzund, J., Wotawa, F.: Automatic classification of software change request using multi-label machine learning methods. In: Proceedings of the 33rd IEEE Software Engineering, Workshop, pp. 79–86 (2009)
go back to reference Aizawa, A.: Linguistic techniques to improve the performance of automatic text categorization. In: Proceedings of NLPRS-01, 6th Natural Language Processing Pacific Rim Symposium, pp. 307–314 (2001) Aizawa, A.: Linguistic techniques to improve the performance of automatic text categorization. In: Proceedings of NLPRS-01, 6th Natural Language Processing Pacific Rim Symposium, pp. 307–314 (2001)
go back to reference Asuncion, H.U., Asuncion, A.U., Taylor, R.N.: Software traceability with topic modeling. In: Proceedings of the 32nd International Conference on Software Engineering, pp. 95–104 (2010) Asuncion, H.U., Asuncion, A.U., Taylor, R.N.: Software traceability with topic modeling. In: Proceedings of the 32nd International Conference on Software Engineering, pp. 95–104 (2010)
go back to reference Bridge, N., Miller, C.: Orthogonal defect classification: using defect data to improve software development. Softw. Qual. 3(1), 1–8 (1998) Bridge, N., Miller, C.: Orthogonal defect classification: using defect data to improve software development. Softw. Qual. 3(1), 1–8 (1998)
go back to reference Caropreso, M., Matwin, S., Sebastiani, F.: A learner independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Chin, A.G. (ed.) Text Databases and Document Management, Theory and Practice, pp. 78–102. Idea Group Publishing, Hershey (2001) Caropreso, M., Matwin, S., Sebastiani, F.: A learner independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Chin, A.G. (ed.) Text Databases and Document Management, Theory and Practice, pp. 78–102. Idea Group Publishing, Hershey (2001)
go back to reference Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. In: SIGKDD Exploration Newsletter, pp. 1–6 (2004) Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. In: SIGKDD Exploration Newsletter, pp. 1–6 (2004)
go back to reference Chillarege, R.: Orthogonal defect classification. In: Lyu, M. (ed.) Handbook of Software Reliability Engineering, pp. 359–400. McGraw-Hill, New York (1995) Chillarege, R.: Orthogonal defect classification. In: Lyu, M. (ed.) Handbook of Software Reliability Engineering, pp. 359–400. McGraw-Hill, New York (1995)
go back to reference Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K., Wong, M.Y.: Orthogonal defect classification-a concept for in-process measurements. IEEE Trans. Softw. Eng. 18(11), 943–956 (1992)CrossRef Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K., Wong, M.Y.: Orthogonal defect classification-a concept for in-process measurements. IEEE Trans. Softw. Eng. 18(11), 943–956 (1992)CrossRef
go back to reference Chillarege, R., Biyani, S.: Identifying risk using odc based growth models. In: Proceedings of the 5th International Symposium on Software, Reliability Engineering, pp. 282–288 (1994) Chillarege, R., Biyani, S.: Identifying risk using odc based growth models. In: Proceedings of the 5th International Symposium on Software, Reliability Engineering, pp. 282–288 (1994)
go back to reference Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the 6th International Conference on Software Engineering and Knowledge, Engineering, pp. 92–97 (2004) Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the 6th International Conference on Software Engineering and Knowledge, Engineering, pp. 92–97 (2004)
go back to reference Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)MATH Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)MATH
go back to reference Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, pp. 11–20 (2010) Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, pp. 11–20 (2010)
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)CrossRef
go back to reference Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A machine learning approach for tracing regulatory codes to product specific requirements. In: Proceedings of the 32nd International Conference on Software Engineering, pp. 155–164 (2010) Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A machine learning approach for tracing regulatory codes to product specific requirements. In: Proceedings of the 32nd International Conference on Software Engineering, pp. 155–164 (2010)
go back to reference Hussain, I., Ormandjieva, O., Kosseim, L.: Automatic quality assessment of srs text by means of a decision-tree-based text classifier. In: Proceedings of the 7th International Conference on Quality Software, pp. 209–218 (2007) Hussain, I., Ormandjieva, O., Kosseim, L.: Automatic quality assessment of srs text by means of a decision-tree-based text classifier. In: Proceedings of the 7th International Conference on Quality Software, pp. 209–218 (2007)
go back to reference Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142. Springer, Berlin (1998) Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142. Springer, Berlin (1998)
go back to reference Kiekel, P., Cooke, N., Foltz, P., Gorman, J., Martin, M.: Some promising results of communication-based automatic measures of team cognition. In: Proceedings of Human Factors and Ergonomics Society: 46th Annual Meeting, pp. 298–302 (2002) Kiekel, P., Cooke, N., Foltz, P., Gorman, J., Martin, M.: Some promising results of communication-based automatic measures of team cognition. In: Proceedings of Human Factors and Ergonomics Society: 46th Annual Meeting, pp. 298–302 (2002)
go back to reference Ko, A., Myers, B.: A linguistic analysis of how people describe software problems. In: IEEE Symposium on Visual Languages and Human-Centric, Computing, pp. 127–134 (2006) Ko, A., Myers, B.: A linguistic analysis of how people describe software problems. In: IEEE Symposium on Visual Languages and Human-Centric, Computing, pp. 127–134 (2006)
go back to reference Lamkanfi, A., Demeyer, S., Giger, E., Goethals, B.: Predicting the severity of a reported bug. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, pp. 1–10 (2010) Lamkanfi, A., Demeyer, S., Giger, E., Goethals, B.: Predicting the severity of a reported bug. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, pp. 1–10 (2010)
go back to reference Lin, Z., Ng, H.T., Kan, M.Y.: A pdtb-styled end-to-end discourse parser. Nat. Lang. Eng. 20, 151–184 (2014)CrossRef Lin, Z., Ng, H.T., Kan, M.Y.: A pdtb-styled end-to-end discourse parser. Nat. Lang. Eng. 20, 151–184 (2014)CrossRef
go back to reference Lutz, R., Mikulski, C.: Empirical analysis of safety-critical anomalies during operations. IEEE Trans. Softw. Eng. 30(3), 172–180 (2004)CrossRef Lutz, R., Mikulski, C.: Empirical analysis of safety-critical anomalies during operations. IEEE Trans. Softw. Eng. 30(3), 172–180 (2004)CrossRef
go back to reference Lutz, R., Mikulski, C.: Ongoing requirements discovery in high integrity systems. IEEE Softw. 21(2), 19–25 (2004)CrossRef Lutz, R., Mikulski, C.: Ongoing requirements discovery in high integrity systems. IEEE Softw. 21(2), 19–25 (2004)CrossRef
go back to reference Ma, L., Tian, J.: Analyzing errors and referral pairs to characterize common problems and improve web reliability. In: Proceedings of the 3rd International Conference on Web, Engineering, pp. 314–323 (2003) Ma, L., Tian, J.: Analyzing errors and referral pairs to characterize common problems and improve web reliability. In: Proceedings of the 3rd International Conference on Web, Engineering, pp. 314–323 (2003)
go back to reference Ma, L., Tian, J.: Web error classification and analysis for reliability improvement. J. Syst. Softw. 80(6), 795–804 (2007)CrossRef Ma, L., Tian, J.: Web error classification and analysis for reliability improvement. J. Syst. Softw. 80(6), 795–804 (2007)CrossRef
go back to reference Mays, R., Jones, C., Holloway, G., Stundisky, D.: Experiences with defects prevention process. IBM Syst. J. 29(1), 4–32 (1990)CrossRef Mays, R., Jones, C., Holloway, G., Stundisky, D.: Experiences with defects prevention process. IBM Syst. J. 29(1), 4–32 (1990)CrossRef
go back to reference Menzies, T., Lutz, R., Mikulski, C.: Better analysis of defect data at NASA. In: Proceedings of the 5th International Conference on Software Engineering and Knowledge, Engineering, pp. 607–611 (2003) Menzies, T., Lutz, R., Mikulski, C.: Better analysis of defect data at NASA. In: Proceedings of the 5th International Conference on Software Engineering and Knowledge, Engineering, pp. 607–611 (2003)
go back to reference Menzies, T., Marcus, A.: Automated severity assessment of software defect reports. In: Proceedings of the International Conference on Software, Maintenance, pp. 346–355 (2008) Menzies, T., Marcus, A.: Automated severity assessment of software defect reports. In: Proceedings of the International Conference on Software, Maintenance, pp. 346–355 (2008)
go back to reference Ormandjieva, O., Kosseim, L., Hussain, I.: Toward a text classification system for the quality assessment of software requirements written in natural language. In: Proceedings of the 4th International Workshop on Software Quality Assurance, pp. 39–45 (2007) Ormandjieva, O., Kosseim, L., Hussain, I.: Toward a text classification system for the quality assessment of software requirements written in natural language. In: Proceedings of the 4th International Workshop on Software Quality Assurance, pp. 39–45 (2007)
go back to reference Pandita, R., Xiao, X., Yang, W., Enck, W., Xie, T.: Whyper: towards automating risk assessment of mobile application. In: Proceedings of 22nd USENIX Security Symposium, pp. 527–542 (2013) Pandita, R., Xiao, X., Yang, W., Enck, W., Xie, T.: Whyper: towards automating risk assessment of mobile application. In: Proceedings of 22nd USENIX Security Symposium, pp. 527–542 (2013)
go back to reference Polpinij, J., Ghose, A.: An automatic elaborate requirement specification by using hierarchical text classification. In: Proceedings of the 2008 International Conference on Computer Science and Software Engineering, pp. 706–709 (2008) Polpinij, J., Ghose, A.: An automatic elaborate requirement specification by using hierarchical text classification. In: Proceedings of the 2008 International Conference on Computer Science and Software Engineering, pp. 706–709 (2008)
go back to reference Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef
go back to reference Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumption of naive bayes text classifiers. In: Proceedings of International Conference on Machine Learning, pp. 616–623 (2003) Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumption of naive bayes text classifiers. In: Proceedings of International Conference on Machine Learning, pp. 616–623 (2003)
go back to reference Romano, D., Pinzger, M.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI Workshop on Learning for Text Categorization, pp. 41–48 (1998) Romano, D., Pinzger, M.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI Workshop on Learning for Text Categorization, pp. 41–48 (1998)
go back to reference Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Texting Mining and Its Applications, pp. 109–129. MIT Press, Cambridge (2005) Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Texting Mining and Its Applications, pp. 109–129. MIT Press, Cambridge (2005)
go back to reference Swigger, K., Brazile, R., Dafoulas, G., Serce, F.C., Alpaslan, F.N., Lopez, V.: Using content and text classification methods to characterize team performance. In: Proceedings of the 5th International Conference on Global, Software Engineering, pp. 192–200 (2010) Swigger, K., Brazile, R., Dafoulas, G., Serce, F.C., Alpaslan, F.N., Lopez, V.: Using content and text classification methods to characterize team performance. In: Proceedings of the 5th International Conference on Global, Software Engineering, pp. 192–200 (2010)
go back to reference Tamrawi, A., Nguyen, T.T., AI-Kofahi, J., Nguyen, T.N.: Fuzzy set-based automatic bug triaging. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 884–887 (2011) Tamrawi, A., Nguyen, T.T., AI-Kofahi, J., Nguyen, T.N.: Fuzzy set-based automatic bug triaging. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 884–887 (2011)
go back to reference Thung, F., Lo, D., Jiang, L.: Automatic defect categorization. In: Proceedings of 19th Working Conference on Reverse Engineering, pp. 205–214 (2012) Thung, F., Lo, D., Jiang, L.: Automatic defect categorization. In: Proceedings of 19th Working Conference on Reverse Engineering, pp. 205–214 (2012)
go back to reference Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001) Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)
go back to reference Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the 21st International Conference on Machine Learning, pp. 104–112 (2004) Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the 21st International Conference on Machine Learning, pp. 104–112 (2004)
go back to reference Yang, C., Hou, C., Kao, W., Chen, I.: An empirical study on improving severity prediction of defect reports using feature selection. In: Proceedings of the 19th Asia-Pacific, Software Engineering Conference, pp. 240–249 (2012) Yang, C., Hou, C., Kao, W., Chen, I.: An empirical study on improving severity prediction of defect reports using feature selection. In: Proceedings of the 19th Asia-Pacific, Software Engineering Conference, pp. 240–249 (2012)
go back to reference Zheng, J., Williams, L., Nagappan, N., Hudpohl, J.: On the value of static analysis tools for fault detection. IEEE Trans. Softw. Eng. 32(44), 240–253 (2006)CrossRef Zheng, J., Williams, L., Nagappan, N., Hudpohl, J.: On the value of static analysis tools for fault detection. IEEE Trans. Softw. Eng. 32(44), 240–253 (2006)CrossRef
Metadata
Title
AutoODC: Automated generation of orthogonal defect classifications
Authors
LiGuo Huang
Vincent Ng
Isaac Persing
Mingrui Chen
Zeheng Li
Ruili Geng
Jeff Tian
Publication date
01-03-2015
Publisher
Springer US
Published in
Automated Software Engineering / Issue 1/2015
Print ISSN: 0928-8910
Electronic ISSN: 1573-7535
DOI
https://doi.org/10.1007/s10515-014-0155-1

Other articles of this Issue 1/2015

Automated Software Engineering 1/2015 Go to the issue

Premium Partner