Skip to main content
Erschienen in: Knowledge and Information Systems 6/2020

12.02.2020 | Regular paper

Enhancing supervised bug localization with metadata and stack-trace

verfasst von: Yaojing Wang, Yuan Yao, Hanghang Tong, Xuan Huo, Ming Li, Feng Xu, Jian Lu

Erschienen in: Knowledge and Information Systems | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Locating relevant source files for a given bug report is an important task in software development and maintenance. To make the locating process easier, information retrieval methods have been widely used to compute the content similarities between bug reports and source files. In addition to content similarities, various other sources of information such as the metadata and the stack-trace in the bug report can be used to enhance the localization accuracy. In this paper, we propose a supervised topic modeling approach for automatically locating the relevant source files of a bug report. In our approach, we take into account the following five key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. Fourth, metainformation brings additional guidance on the search space. Fifth, buggy source files could be already contained in the stack-trace. By integrating the above five observations, we experimentally show that the proposed method can achieve up to 67.1% improvement in terms of prediction accuracy over its best competitors and scales linearly with the size of the data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
This work is an extended version of our previous work [3] which considers the previous three components. Please refer to the related work section for more details.
 
2
In this paper, we interchangeably use ‘document’ and ‘bug report.’
 
3
A bug report may relate to multiple source files.
 
4
To simplify the processing of source files, we only keep the words in source flies that have appeared in the bug reports.
 
5
We incorporate these four terms in the model for completeness
 
8
This is exactly the STMLocator method in the previous conference version [3].
 
Literatur
1.
Zurück zum Zitat Le T-DB, Thung F, Lo D (2017) Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22(4):2237–2279CrossRef Le T-DB, Thung F, Lo D (2017) Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22(4):2237–2279CrossRef
2.
Zurück zum Zitat Zhang X, Yao Y, Wang Y, Xu F, Lu J (2017) Exploring metadata in bug reports for bug localization. In: Asia-Pacific software engineering conference (APSEC), 2017 24th. IEEE, pp 328–337 Zhang X, Yao Y, Wang Y, Xu F, Lu J (2017) Exploring metadata in bug reports for bug localization. In: Asia-Pacific software engineering conference (APSEC), 2017 24th. IEEE, pp 328–337
3.
Zurück zum Zitat Wang Y, Yao Y, Hanghang T, Huo X, Li M, Xu F, Lu J (2018) Bug localization via supervised topic modeling. In: ICDM Wang Y, Yao Y, Hanghang T, Huo X, Li M, Xu F, Lu J (2018) Bug localization via supervised topic modeling. In: ICDM
4.
Zurück zum Zitat Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: ICSE. IEEE, pp 14–24 Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: ICSE. IEEE, pp 14–24
5.
Zurück zum Zitat Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: ASE. IEEE, pp 345–355 Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: ASE. IEEE, pp 345–355
6.
Zurück zum Zitat Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: ICPC. ACM, pp 53–63 Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: ICPC. ACM, pp 53–63
7.
Zurück zum Zitat Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: MSR. IEEE, pp 247–256 Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: MSR. IEEE, pp 247–256
8.
Zurück zum Zitat Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 151–160 Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 151–160
9.
Zurück zum Zitat Wong C-P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 181–190 Wong C-P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 181–190
10.
Zurück zum Zitat Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 404–415 Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 404–415
11.
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
12.
Zurück zum Zitat Wu R, Zhang H, Cheung S.-C, Kim S (2014) Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 international symposium on software testing and analysis. ACM, pp 204–214 Wu R, Zhang H, Cheung S.-C, Kim S (2014) Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 international symposium on software testing and analysis. ACM, pp 204–214
13.
Zurück zum Zitat Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: The foundations of software engineering. ACM, pp 689–699 Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: The foundations of software engineering. ACM, pp 689–699
14.
Zurück zum Zitat Xia X, Lo D, Shihab E, Wang X, Zhou B (2015) Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22(1):75–109CrossRef Xia X, Lo D, Shihab E, Wang X, Zhou B (2015) Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22(1):75–109CrossRef
15.
Zurück zum Zitat Ashok B, Joy J, Liang H, Rajamani SK, Srinivasa G, Vangala V (2009) Debugadvisor: a recommender system for debugging. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 373–382 Ashok B, Joy J, Liang H, Rajamani SK, Srinivasa G, Vangala V (2009) Debugadvisor: a recommender system for debugging. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 373–382
16.
Zurück zum Zitat Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on Aspect-oriented software development. ACM, pp 212–224 Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on Aspect-oriented software development. ACM, pp 212–224
17.
Zurück zum Zitat Saha RK, Lawall J, Khurshid S, Perry DE (2014) On the effectiveness of information retrieval based bug localization for c programs. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 161–170 Saha RK, Lawall J, Khurshid S, Perry DE (2014) On the effectiveness of information retrieval based bug localization for c programs. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 161–170
18.
Zurück zum Zitat Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent Dirichlet allocation. Inf Softw Technol 52(9):972–990CrossRef Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent Dirichlet allocation. Inf Softw Technol 52(9):972–990CrossRef
19.
Zurück zum Zitat Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: ASE. IEEE, pp 263–272 Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: ASE. IEEE, pp 263–272
20.
Zurück zum Zitat Kim D, Tao Y, Kim S, Zeller A (2013) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39(11):1597–1610CrossRef Kim D, Tao Y, Kim S, Zeller A (2013) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39(11):1597–1610CrossRef
21.
Zurück zum Zitat Liu C, Yan X, Fei L, Han J, Midkiff SP (2005) Sober: statistical model-based bug localization. In: ACM SIGSOFT Software Engineering Notes, vol 30. ACM, pp 286–295 Liu C, Yan X, Fei L, Han J, Midkiff SP (2005) Sober: statistical model-based bug localization. In: ACM SIGSOFT Software Engineering Notes, vol 30. ACM, pp 286–295
22.
Zurück zum Zitat Poshyvanyk D, Gueheneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432CrossRef Poshyvanyk D, Gueheneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432CrossRef
23.
Zurück zum Zitat Youm KC, Ahn J, Kim J, Lee E (2015) Bug localization based on code change histories and bug reports. In: APSEC, pp 190–197 Youm KC, Ahn J, Kim J, Lee E (2015) Bug localization based on code change histories and bug reports. In: APSEC, pp 190–197
24.
Zurück zum Zitat Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports. In: ASE. IEEE, pp 476–481 Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports. In: ASE. IEEE, pp 476–481
25.
Zurück zum Zitat Huo X, Li M, Zhou Z-H (2016) Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI, pp 1606–1612 Huo X, Li M, Zhou Z-H (2016) Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI, pp 1606–1612
26.
Zurück zum Zitat Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1909–1915 Huo X, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 1909–1915
27.
Zurück zum Zitat Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th International Conference on program comprehension (ICPC). IEEE, pp 218–229 Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th International Conference on program comprehension (ICPC). IEEE, pp 218–229
28.
Zurück zum Zitat Xiao Y, Keung J, Mi Q, Bennin KE (2017) Improving bug localization with an enhanced convolutional neural network. In: 2017 24th Asia-Pacific software engineering conference (APSEC). IEEE, pp 338–347 Xiao Y, Keung J, Mi Q, Bennin KE (2017) Improving bug localization with an enhanced convolutional neural network. In: 2017 24th Asia-Pacific software engineering conference (APSEC). IEEE, pp 338–347
29.
Zurück zum Zitat Xiao Y, Keung J, Bennin KE, Mi Q (2018) Machine translation-based bug localization technique for bridging lexical gap. Inf Softw Technol 99:58–61CrossRef Xiao Y, Keung J, Bennin KE, Mi Q (2018) Machine translation-based bug localization technique for bridging lexical gap. Inf Softw Technol 99:58–61CrossRef
30.
Zurück zum Zitat Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. ACM SIGPLAN Not 40(6):15–26CrossRef Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. ACM SIGPLAN Not 40(6):15–26CrossRef
31.
Zurück zum Zitat Liu C, Fei L, Yan X, Han J, Midkiff SP (2006) Statistical debugging: a hypothesis testing-based approach. IEEE Trans Softw Eng 32(10):831–848CrossRef Liu C, Fei L, Yan X, Han J, Midkiff SP (2006) Statistical debugging: a hypothesis testing-based approach. IEEE Trans Softw Eng 32(10):831–848CrossRef
32.
Zurück zum Zitat Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: ASE. ACM, pp 273–282 Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: ASE. ACM, pp 273–282
33.
Zurück zum Zitat Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: TAICPART-MUTATION. IEEE 2007, pp 89–98 Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: TAICPART-MUTATION. IEEE 2007, pp 89–98
34.
Zurück zum Zitat Xuan J, Monperrus M (2014) Learning to combine multiple ranking metrics for fault localization. In: ICSME Xuan J, Monperrus M (2014) Learning to combine multiple ranking metrics for fault localization. In: ICSME
35.
Zurück zum Zitat Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: a tool for change impact analysis of java programs. In: ACM Sigplan Notices, vol. 39, no. 10. ACM, pp 432–448 Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: a tool for change impact analysis of java programs. In: ACM Sigplan Notices, vol. 39, no. 10. ACM, pp 432–448
36.
Zurück zum Zitat Chesley OC, Ren X, Ryder BG, Tip F (2007) Crisp—a fault localization tool for java programs. In: 29th international conference on software engineering, 2007 (ICSE 2007). IEEE, pp 775–779 Chesley OC, Ren X, Ryder BG, Tip F (2007) Crisp—a fault localization tool for java programs. In: 29th international conference on software engineering, 2007 (ICSE 2007). IEEE, pp 775–779
37.
Zurück zum Zitat Brun Y, Ernst MD (2004) Finding latent code errors via machine learning over program executions. In: Proceedings of 26th international conference on software engineering, 2004 (ICSE 2004). IEEE, pp 480–490 Brun Y, Ernst MD (2004) Finding latent code errors via machine learning over program executions. In: Proceedings of 26th international conference on software engineering, 2004 (ICSE 2004). IEEE, pp 480–490
38.
Zurück zum Zitat Le T-DB, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: FSE. ACM, pp 579–590 Le T-DB, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: FSE. ACM, pp 579–590
39.
Zurück zum Zitat Hoang TV-D, Oentaryo RJ, Le T-DB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Trans Softw Eng 45(10):1002–1023CrossRef Hoang TV-D, Oentaryo RJ, Le T-DB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Trans Softw Eng 45(10):1002–1023CrossRef
40.
Zurück zum Zitat Weiser M (1982) Programmers use slices when debugging. Commun ACM 25(7):446–452CrossRef Weiser M (1982) Programmers use slices when debugging. Commun ACM 25(7):446–452CrossRef
41.
Zurück zum Zitat Manevich R, Sridharan M, Adams S, Das M, Yang Z (2004) Pse: explaining program failures via postmortem static analysis. In: ACM SIGSOFT software engineering notes, vol 29, no. 6. ACM, pp 63–72 Manevich R, Sridharan M, Adams S, Das M, Yang Z (2004) Pse: explaining program failures via postmortem static analysis. In: ACM SIGSOFT software engineering notes, vol 29, no. 6. ACM, pp 63–72
42.
Zurück zum Zitat Acharya M, Robinson B (2011) Practical change impact analysis based on static program slicing for industrial software systems. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 746–755 Acharya M, Robinson B (2011) Practical change impact analysis based on static program slicing for industrial software systems. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 746–755
43.
Zurück zum Zitat Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE. ACM, pp 111–120 Jeong G, Kim S, Zimmermann T (2009) Improving bug triage with bug tossing graphs. In: ESEC/FSE. ACM, pp 111–120
44.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH
45.
Zurück zum Zitat Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP. Association for Computational Linguistics, pp 248–256 Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP. Association for Computational Linguistics, pp 248–256
46.
Zurück zum Zitat Asuncion A, Welling M, Smyth P, Teh YW (2009) On smoothing and inference for topic models. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. The Association for Uncertainty in Artificial Intelligence Press, pp 27–34 Asuncion A, Welling M, Smyth P, Teh YW (2009) On smoothing and inference for topic models. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. The Association for Uncertainty in Artificial Intelligence Press, pp 27–34
47.
Zurück zum Zitat Porteous I, Newman D, Ihler A, Asuncion A, Smyth P, Welling M (2008) Fast collapsed Gibbs sampling for latent Dirichlet allocation. In: KDD. ACM, pp 569–577 Porteous I, Newman D, Ihler A, Asuncion A, Smyth P, Welling M (2008) Fast collapsed Gibbs sampling for latent Dirichlet allocation. In: KDD. ACM, pp 569–577
48.
Zurück zum Zitat Si X, Sun M (2009) Tag-lda for scalable real-time tag recommendation. J Comput Inf Syst 6(1):23–31CrossRef Si X, Sun M (2009) Tag-lda for scalable real-time tag recommendation. J Comput Inf Syst 6(1):23–31CrossRef
49.
Zurück zum Zitat Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620MATHCrossRef Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620MATHCrossRef
Metadaten
Titel
Enhancing supervised bug localization with metadata and stack-trace
verfasst von
Yaojing Wang
Yuan Yao
Hanghang Tong
Xuan Huo
Ming Li
Feng Xu
Jian Lu
Publikationsdatum
12.02.2020
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 6/2020
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-019-01426-2

Weitere Artikel der Ausgabe 6/2020

Knowledge and Information Systems 6/2020 Zur Ausgabe