Skip to main content
Erschienen in: Empirical Software Engineering 5/2018

23.01.2018

Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval

verfasst von: Mohamed Sami Rakha, Cor-Paul Bezemer, Ahmed E. Hassan

Erschienen in: Empirical Software Engineering | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Issue tracking systems (ITSs) allow software end-users and developers to file issue reports and change requests. Reports are frequently duplicately filed for the same software issue. The retrieval of these duplicate issue reports is a tedious manual task. Prior research proposed several automated approaches for the retrieval of duplicate issue reports. Recent versions of ITSs added a feature that does basic retrieval of duplicate issue reports at the filing time of an issue report in an effort to avoid the filing of duplicates as early as possible. This paper investigates the impact of this just-in-time duplicate retrieval on the duplicate reports that end up in the ITS of an open source project. In particular, we study the differences between duplicate reports for open source projects before and after the activation of this new feature. We show how the experimental results of prior research would vary given the new data after the activation of the just-in-time duplicate retrieval feature. We study duplicate issue reports from the Mozilla-Firefox, Mozilla-Core and Eclipse-Platform projects. In addition, we compare the performance of the state of the art of the automated retrieval of duplicate reports using two popular approaches (i.e., BM25F and REP). We find that duplicate issue reports after the activation of the just-in-time duplicate retrieval feature are less textually similar, have a greater identification delay and require more discussion to be retrieved as duplicate reports than duplicates before the activation of the feature. Prior work showed that REP outperforms BM25F in terms of Recall rate and Mean average precision. We observe that the performance gap between BM25F and REP becomes even larger after the activation of the just-in-time duplicate retrieval feature. We recommend that future studies focus on duplicates that were reported after the activation of the just-in-time duplicate retrieval feature as these duplicates are more representative of future incoming issue reports and therefore, give a better representation of the future performance of proposed approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal K, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Proceedings of the 22th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp 211–220 Aggarwal K, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Proceedings of the 22th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp 211–220
Zurück zum Zitat Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp 183–192 Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp 183–192
Zurück zum Zitat Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 35–39 Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 35–39
Zurück zum Zitat Banerjee S, Syed Z, Helmick J, Culp M, Ryan K, Cukic B (2017) Automated triaging of very large bug repositories. Inf Softw Technol 89(Supplement C):1–13CrossRef Banerjee S, Syed Z, Helmick J, Culp M, Ryan K, Cukic B (2017) Automated triaging of very large bug repositories. Inf Softw Technol 89(Supplement C):1–13CrossRef
Zurück zum Zitat Berry MW, Castellanos M (2004) Survey of text mining. Comput Rev 45(9):548 Berry MW, Castellanos M (2004) Survey of text mining. Comput Rev 45(9):548
Zurück zum Zitat Bettenburg N, Just S, Schröter A, Weiß C, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 21–25 Bettenburg N, Just S, Schröter A, Weiß C, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 21–25
Zurück zum Zitat Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT/FSE). ACM, pp 308–318 Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT/FSE). ACM, pp 308–318
Zurück zum Zitat Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful...really? In: Proceedings of the 24th International Conference on Software Maintenance (ICSM). IEEE, pp 337–345 Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful...really? In: Proceedings of the 24th International Conference on Software Maintenance (ICSM). IEEE, pp 337–345
Zurück zum Zitat Borg M, Runeson P (2014) Changes, evolution, and bugs. Springer, Berlin, pp 477–509 Borg M, Runeson P (2014) Changes, evolution, and bugs. Springer, Berlin, pp 477–509
Zurück zum Zitat Borg M, Runeson P, Johansson J, Mäntylä MV (2014) A replicated study on duplicate detection: Using apache lucene to search among android defects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM, New York, pp 8:1–8:4 Borg M, Runeson P, Johansson J, Mäntylä MV (2014) A replicated study on duplicate detection: Using apache lucene to search among android defects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM, New York, pp 8:1–8:4
Zurück zum Zitat Cavalcanti YC, Neto PAdMS, Lucrédio D, Vale T, de Almeida ES, de Lemos Meira SR (2013) The bug report duplication problem: an exploratory study. Softw Qual J 21(1):39–66CrossRef Cavalcanti YC, Neto PAdMS, Lucrédio D, Vale T, de Almeida ES, de Lemos Meira SR (2013) The bug report duplication problem: an exploratory study. Softw Qual J 21(1):39–66CrossRef
Zurück zum Zitat Cavalcanti YC, da Mota Silveira Neto PA, Machado IdC, Vale TF, de Almeida ES, Meira SRdL (2014) Challenges and opportunities for software change request repositories: a systematic mapping study. J Softw Evol Process 26(7):620–653CrossRef Cavalcanti YC, da Mota Silveira Neto PA, Machado IdC, Vale TF, de Almeida ES, Meira SRdL (2014) Challenges and opportunities for software change request repositories: a systematic mapping study. J Softw Evol Process 26(7):620–653CrossRef
Zurück zum Zitat Chowdhury G (2010) Introduction to modern information retrieval. Facet publishing, UK Chowdhury G (2010) Introduction to modern information retrieval. Facet publishing, UK
Zurück zum Zitat Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, Vanhoutte A (1989) Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Inf Process Manag 25(3):315–318CrossRef Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, Vanhoutte A (1989) Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Inf Process Manag 25(3):315–318CrossRef
Zurück zum Zitat Hassan AE (2008) The road ahead for mining software repositories. In: Proceedings of the Frontiers of Software Maintenance (FoSM). IEEE, pp 48–57 Hassan AE (2008) The road ahead for mining software repositories. In: Proceedings of the Frontiers of Software Maintenance (FoSM). IEEE, pp 48–57
Zurück zum Zitat Hindle A (2016) Stopping duplicate bug reports before they start with Continuous Querying for bug reports. PeerJ Prepr 4:e2373v1 Hindle A (2016) Stopping duplicate bug reports before they start with Continuous Querying for bug reports. PeerJ Prepr 4:e2373v1
Zurück zum Zitat Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21(2):368–410CrossRef Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21(2):368–410CrossRef
Zurück zum Zitat Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the 38th International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, pp 52–61 Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the 38th International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, pp 52–61
Zurück zum Zitat Koponen T (2006) Life cycle of defects in open source software projects. In: Open Source Systems. Springer, pp 195–200 Koponen T (2006) Life cycle of defects in open source software projects. In: Open Source Systems. Springer, pp 195–200
Zurück zum Zitat Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). ACM, pp 308–311 Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). ACM, pp 308–311
Zurück zum Zitat Long JD, Feng D, Cliff N (2003) Ordinal analysis of behavioral data. Handbook of psychology Long JD, Feng D, Cliff N (2003) Ordinal analysis of behavioral data. Handbook of psychology
Zurück zum Zitat Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the 1st International Conference on Advances in Computing, Communication and Control (ICAC3). ACM, pp 202–207 Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the 1st International Conference on Advances in Computing, Communication and Control (ICAC3). ACM, pp 202–207
Zurück zum Zitat Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE). ACM, pp 70–79 Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE). ACM, pp 70–79
Zurück zum Zitat Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 311–318 Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 311–318
Zurück zum Zitat Rakha MS, Shang W, Hassan AE (2016) Studying the needed effort for identifying duplicate issues. Empir Softw Eng (EMSE) 21(5):1960–1989CrossRef Rakha MS, Shang W, Hassan AE (2016) Studying the needed effort for identifying duplicate issues. Empir Softw Eng (EMSE) 21(5):1960–1989CrossRef
Zurück zum Zitat Rakha MS, Bezemer CP, Hassan AE (2017) Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports. IEEE Trans Softw Eng (TSE) PP(99):1–27CrossRef Rakha MS, Bezemer CP, Hassan AE (2017) Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports. IEEE Trans Softw Eng (TSE) PP(99):1–27CrossRef
Zurück zum Zitat Robertson S, Zaragoza H, Taylor M (2004) Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th International Conference on Information and Knowledge Management (CIKM). ACM, pp 42–49 Robertson S, Zaragoza H, Taylor M (2004) Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th International Conference on Information and Knowledge Management (CIKM). ACM, pp 42–49
Zurück zum Zitat Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and Cohens’d indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and Cohens’d indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research
Zurück zum Zitat Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th International Conference on Software Engineering (ICSE). IEEE Computer Society, pp 499–510 Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th International Conference on Software Engineering (ICSE). IEEE Computer Society, pp 499–510
Zurück zum Zitat Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using Latent Dirichlet Allocation. In: Proceedings of the 5th India Software Engineering Conference (ISEC). ACM, pp 125–130 Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using Latent Dirichlet Allocation. In: Proceedings of the 5th India Software Engineering Conference (ISEC). ACM, pp 125–130
Zurück zum Zitat Strzalkowski T, Lin F, Wang J, Perez-Carballo J (1999) Evaluating natural language processing techniques in information retrieval. In: Natural language information retrieval. Springer, pp 113–145 Strzalkowski T, Lin F, Wang J, Perez-Carballo J (1999) Evaluating natural language processing techniques in information retrieval. In: Natural language information retrieval. Springer, pp 113–145
Zurück zum Zitat Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th ACM/IEEE International Conference on Software Engineering (ICSE). ACM, pp 45–54 Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th ACM/IEEE International Conference on Software Engineering (ICSE). ACM, pp 45–54
Zurück zum Zitat Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 253–262 Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 253–262
Zurück zum Zitat Sun C, Le V, Zhang Q, Su Z (2016) Toward understanding compiler bugs in GCC and LLVM. In: Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA). ACM, New York, pp 294–305 Sun C, Le V, Zhang Q, Su Z (2016) Toward understanding compiler bugs in GCC and LLVM. In: Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA). ACM, New York, pp 294–305
Zurück zum Zitat Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: Proceedings of the 17th Asia Pacific Software Engineering Conference (APSEC). IEEE Computer Society, pp 366–374 Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: Proceedings of the 17th Asia Pacific Software Engineering Conference (APSEC). IEEE Computer Society, pp 366–374
Zurück zum Zitat Taylor M, Zaragoza H, Craswell N, Robertson S, Burges C (2006) Optimisation methods for ranking functions with multiple parameters. In: CIKM 2006: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, pp 585–593 Taylor M, Zaragoza H, Craswell N, Robertson S, Burges C (2006) Optimisation methods for ranking functions with multiple parameters. In: CIKM 2006: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, pp 585–593
Zurück zum Zitat Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering (ICSE). ACM, pp 461–470 Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering (ICSE). ACM, pp 461–470
Zurück zum Zitat Zhou J, Zhang H (2012) Learning to rank duplicate bug reports. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 852–861 Zhou J, Zhang H (2012) Learning to rank duplicate bug reports. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 852–861
Zurück zum Zitat Zou J, Xu L, Yang M, Zhang X, Zeng J, Hirokawa S (2016) Automated duplicate bug report detection using multi-factor analysis. IEICE Trans Inf Syst E99.D(7):1762–1775CrossRef Zou J, Xu L, Yang M, Zhang X, Zeng J, Hirokawa S (2016) Automated duplicate bug report detection using multi-factor analysis. IEICE Trans Inf Syst E99.D(7):1762–1775CrossRef
Metadaten
Titel
Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval
verfasst von
Mohamed Sami Rakha
Cor-Paul Bezemer
Ahmed E. Hassan
Publikationsdatum
23.01.2018
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 5/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9590-5

Weitere Artikel der Ausgabe 5/2018

Empirical Software Engineering 5/2018 Zur Ausgabe