Skip to main content
Erschienen in: Empirical Software Engineering 2/2019

20.08.2018

Preventing duplicate bug reports by continuously querying bug reports

verfasst von: Abram Hindle, Curtis Onuczko

Erschienen in: Empirical Software Engineering | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuously querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuously querying bug reports allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures can address this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal K, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: 22nd international conference on software analysis, evolution and reengineering (SANER), 2015 IEEE, pp 211–220. IEEE Aggarwal K, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: 22nd international conference on software analysis, evolution and reengineering (SANER), 2015 IEEE, pp 211–220. IEEE
Zurück zum Zitat Alipour A (2013) A contextual approach towards more accurate duplicate bug report detection. Master’s thesis University of Alberta Alipour A (2013) A contextual approach towards more accurate duplicate bug report detection. Master’s thesis University of Alberta
Zurück zum Zitat Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the Tenth International Workshop on Mining Software Repositories, pp 183–192. IEEE Press Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the Tenth International Workshop on Mining Software Repositories, pp 183–192. IEEE Press
Zurück zum Zitat Arasu A, Babu S, Widom J (2006) The cql continuous query language: semantic foundations and query execution. VLDB J 15(2):121–142CrossRef Arasu A, Babu S, Widom J (2006) The cql continuous query language: semantic foundations and query execution. VLDB J 15(2):121–142CrossRef
Zurück zum Zitat Asaduzzaman M, Roy CK, Schneider KA, Hou D (2014) Cscc: Simple, efficient, context sensitive code completion. In: 2014 IEEE International conference on software maintenance and evolution (ICSME), pp 71–80. IEEE Asaduzzaman M, Roy CK, Schneider KA, Hou D (2014) Cscc: Simple, efficient, context sensitive code completion. In: 2014 IEEE International conference on software maintenance and evolution (ICSME), pp 71–80. IEEE
Zurück zum Zitat Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really?. In: IEEE international conference on software maintenance, 2008. ICSM 2008, pp 337–345. IEEE Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really?. In: IEEE international conference on software maintenance, 2008. ICSM 2008, pp 337–345. IEEE
Zurück zum Zitat Chandrasekaran S, Cooper O, Deshpande A, Franklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden SR, Reiss F, Shah MA (2003) Telegraphcq: Continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD ’03. ACM, New York, pp 668–668. http://doi.acm.org/10.1145/872757.872857 Chandrasekaran S, Cooper O, Deshpande A, Franklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden SR, Reiss F, Shah MA (2003) Telegraphcq: Continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD ’03. ACM, New York, pp 668–668. http://​doi.​acm.​org/​10.​1145/​872757.​872857
Zurück zum Zitat Haiduc S (2014) Supporting query formulation for text retrieval applications in software engineering. In: 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, September 29 - October 3, 2014, pp 657–662. IEEE Computer Society. https://doi.org/10.1109/ICSME.2014.117 Haiduc S (2014) Supporting query formulation for text retrieval applications in software engineering. In: 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, September 29 - October 3, 2014, pp 657–662. IEEE Computer Society. https://​doi.​org/​10.​1109/​ICSME.​2014.​117
Zurück zum Zitat Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: IEEE International Conference on dependable systems and networks with FTCS and DCC, 2008. DSN 2008, pp 52–61. IEEE Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: IEEE International Conference on dependable systems and networks with FTCS and DCC, 2008. DSN 2008, pp 52–61. IEEE
Zurück zum Zitat Kao B, Garcia-Molina H (1994) An overview of real-time database systems. In: Real time computing, pp 261–282. Springer Kao B, Garcia-Molina H (1994) An overview of real-time database systems. In: Real time computing, pp 261–282. Springer
Zurück zum Zitat Klein N, Corley CS, Kraft NA (2014) New features for duplicate bug detection. In: MSR, pp 324–327 Klein N, Corley CS, Kraft NA (2014) New features for duplicate bug detection. In: MSR, pp 324–327
Zurück zum Zitat Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 308–311. ACM Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 308–311. ACM
Zurück zum Zitat Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of the 2008 15th Working Conference on Reverse Engineering, WCRE ’08. IEEE Computer Society, Washington, pp 155–164. https://doi.org/10.1109/WCRE.2008.33 Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of the 2008 15th Working Conference on Reverse Engineering, WCRE ’08. IEEE Computer Society, Washington, pp 155–164. https://​doi.​org/​10.​1109/​WCRE.​2008.​33
Zurück zum Zitat Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 70–79. ACM Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 70–79. ACM
Zurück zum Zitat Panichella A, Dit B, Oliveto R, Penta MD, Poshyvanyk D, Lucia AD (2016) Parameterizing and assembling ir-based solutions for SE tasks using genetic algorithms. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, Suita, Osaka, Japan, March 14-18, 2016, pp 314–325. IEEE Computer Society. https://doi.org/10.1109/SANER.2016.97 Panichella A, Dit B, Oliveto R, Penta MD, Poshyvanyk D, Lucia AD (2016) Parameterizing and assembling ir-based solutions for SE tasks using genetic algorithms. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, Suita, Osaka, Japan, March 14-18, 2016, pp 314–325. IEEE Computer Society. https://​doi.​org/​10.​1109/​SANER.​2016.​97
Zurück zum Zitat Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackoverflow to turn the ide into a self-confident programming prompter. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014. ACM, New York, pp 102–111. http://doi.acm.org/10.1145/2597073.2597077 Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackoverflow to turn the ide into a self-confident programming prompter. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014. ACM, New York, pp 102–111. http://​doi.​acm.​org/​10.​1145/​2597073.​2597077
Zurück zum Zitat Rakha MS, Bezemer CP, Hassan AE (2018) Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval Empirical Software Engineering Rakha MS, Bezemer CP, Hassan AE (2018) Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval Empirical Software Engineering
Zurück zum Zitat Rocha H, De Oliveira G, Marques-Neto H, Valente MT (2015) Nextbug: a bugzilla extension for recommending similar bugs. Journal of Software Engineering Research and Development 3(1):1–14CrossRef Rocha H, De Oliveira G, Marques-Neto H, Valente MT (2015) Nextbug: a bugzilla extension for recommending similar bugs. Journal of Software Engineering Research and Development 3(1):1–14CrossRef
Zurück zum Zitat Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: 29th international conference on Software engineering, 2007. ICSE 2007, pp 499–510. IEEE Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: 29th international conference on Software engineering, 2007. ICSE 2007, pp 499–510. IEEE
Zurück zum Zitat Sabor KK, Hamou-Lhadj A, Larsson A (2017) Durfex: a feature extraction technique for efficient detection of duplicate bug reports. In: 2017 IEEE international conference on software quality, reliability and security (QRS), pp 240–250. IEEE Sabor KK, Hamou-Lhadj A, Larsson A (2017) Durfex: a feature extraction technique for efficient detection of duplicate bug reports. In: 2017 IEEE international conference on software quality, reliability and security (QRS), pp 240–250. IEEE
Zurück zum Zitat Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp 253–262. IEEE Computer Society Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp 253–262. IEEE Computer Society
Zurück zum Zitat Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp 45–54. ACM Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp 45–54. ACM
Zurück zum Zitat Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: Software engineering conference (APSEC), 2010 17th asia pacific, pp 366–374. IEEE Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: Software engineering conference (APSEC), 2010 17th asia pacific, pp 366–374. IEEE
Zurück zum Zitat Wang S, Lo D, Lawall J (2014) Compositional vector space models for improved bug localization. In: 2014 IEEE international conference on software maintenance and evolution (ICSME), pp 171–180. IEEE Wang S, Lo D, Lawall J (2014) Compositional vector space models for improved bug localization. In: 2014 IEEE international conference on software maintenance and evolution (ICSME), pp 171–180. IEEE
Zurück zum Zitat Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on Software engineering, pp 461–470. ACM Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on Software engineering, pp 461–470. ACM
Metadaten
Titel
Preventing duplicate bug reports by continuously querying bug reports
verfasst von
Abram Hindle
Curtis Onuczko
Publikationsdatum
20.08.2018
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2019
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-018-9643-4

Weitere Artikel der Ausgabe 2/2019

Empirical Software Engineering 2/2019 Zur Ausgabe