Skip to main content
Erschienen in: Empirical Software Engineering 2/2016

01.04.2016

A contextual approach towards more accurate duplicate bug report detection and ranking

verfasst von: Abram Hindle, Anahita Alipour, Eleni Stroulia

Erschienen in: Empirical Software Engineering | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The issue-tracking systems used by software projects contain issues, bugs, or tickets written by a wide variety of bug reporters, with different levels of training and knowledge about the system under development. Typically, reporters lack the skills and/or time to search the issue-tracking system for similar issues already reported. As a result, many reports end up referring to the same issue, which effectively makes the bug-report triaging process time consuming and error prone. Many researchers have approached the bug-deduplication problem using off-the-shelf information-retrieval (IR) tools. In this work, we extend the state of the art by investigating how contextual information about software-quality attributes, software-architecture terms, and system-development topics can be exploited to improve bug deduplication. We demonstrate the effectiveness of our contextual bug-deduplication method at ranking duplicates on the bug repositories of the Android, Eclipse, Mozilla, and OpenOffice software systems. Based on this experience, we conclude that taking into account domain-specific context can improve IR methods for bug deduplication.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Karan A, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Guéhéneuc Y-G, Adams B, Serebrenik A (eds) 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015, pp 211–220. IEEE Karan A, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Guéhéneuc Y-G, Adams B, Serebrenik A (eds) 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015, pp 211–220. IEEE
Zurück zum Zitat Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the Tenth International Workshop on Mining Software Repositories, pp 183–192. IEEE Press Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the Tenth International Workshop on Mining Software Repositories, pp 183–192. IEEE Press
Zurück zum Zitat Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pp 35–39. ACM Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pp 35–39. ACM
Zurück zum Zitat Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. In: Proceedings of the 28th international conference on Software engineering, pp 361–370. ACM Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. In: Proceedings of the 28th international conference on Software engineering, pp 361–370. ACM
Zurück zum Zitat Ayewah N, Pugh W (2010) The google findbugs fixit. In: Proceedings of the 19th international symposium on Software testing and analysis, pp 241–252. ACM Ayewah N, Pugh W (2010) The google findbugs fixit. In: Proceedings of the 19th international symposium on Software testing and analysis, pp 241–252. ACM
Zurück zum Zitat Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really?. In: 2008 IEEE International Conference on Software Maintenance, ICSM 2008, pp 337–345 . IEEE Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really?. In: 2008 IEEE International Conference on Software Maintenance, ICSM 2008, pp 337–345 . IEEE
Zurück zum Zitat Brown A, Wilson G (2011) The Architecture Of Open Source Applications. lulu.com Brown A, Wilson G (2011) The Architecture Of Open Source Applications. lulu.com
Zurück zum Zitat Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. ACM, New York, pp 33–40 Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. ACM, New York, pp 33–40
Zurück zum Zitat Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: Wieringa R, Persson A (eds) Requirements Engineering: Foundation for Software Quality, volume 6182 of Lecture Notes in Computer Science, pp 143–157. Springer, Berlin Ernst NA, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: Wieringa R, Persson A (eds) Requirements Engineering: Foundation for Software Quality, volume 6182 of Lecture Notes in Computer Science, pp 143–157. Springer, Berlin
Zurück zum Zitat Grosskurth A, Godfrey MW (2006) Architecture and evolution of the modern web browser. Preprint submitted to Elsevier Science Grosskurth A, Godfrey MW (2006) Architecture and evolution of the modern web browser. Preprint submitted to Elsevier Science
Zurück zum Zitat Guana V, Rocha F, Hindle A, Stroulia E (2012) Do the stars align? Multidimensional analysis of Android’s layered architecture. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 124–127. IEEE Guana V, Rocha F, Hindle A, Stroulia E (2012) Do the stars align? Multidimensional analysis of Android’s layered architecture. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 124–127. IEEE
Zurück zum Zitat Han D, Zhang C, Fan X, Hindle A, Wong K, Stroulia E (2012) Understanding android fragmentation with topic analysis of vendor-specific bugs. IEEE Han D, Zhang C, Fan X, Hindle A, Wong K, Stroulia E (2012) Understanding android fragmentation with topic analysis of vendor-specific bugs. IEEE
Zurück zum Zitat Hangal S, Lam MS (2002) Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th international conference on Software engineering, pp 291–301. ACM Hangal S, Lam MS (2002) Tracking down software bugs using automatic anomaly detection. In: Proceedings of the 24th international conference on Software engineering, pp 291–301. ACM
Zurück zum Zitat Hiew L (2006) Assisted detection of duplicate bug reports. PhD thesis, The University Of British Columbia Hiew L (2006) Assisted detection of duplicate bug reports. PhD thesis, The University Of British Columbia
Zurück zum Zitat Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: Proceedings of the 8th Working Conference on Mining Software Repositories, pp 163–172. ACM Hindle A, Ernst NA, Godfrey MW, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: Proceedings of the 8th Working Conference on Mining Software Repositories, pp 163–172. ACM
Zurück zum Zitat Holmes G, Donkin A, Witten IH (1994) Weka: A machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, 1994, pp 357–361. IEEE Holmes G, Donkin A, Witten IH (1994) Weka: A machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, 1994, pp 357–361. IEEE
Zurück zum Zitat Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC, DSN 2008, pp 52–61. IEEE Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC, DSN 2008, pp 52–61. IEEE
Zurück zum Zitat Kayed A, Hirzalla N, Samhan AA, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: ICIW’09 Fourth International Conference on Internet and Web Applications and Services, 2009, pp 200–204. IEEE Kayed A, Hirzalla N, Samhan AA, Alfayoumi M (2009) Towards an ontology for software product quality attributes. In: ICIW’09 Fourth International Conference on Internet and Web Applications and Services, 2009, pp 200–204. IEEE
Zurück zum Zitat Langford J, Li L, Strehl A (2007) Vowpal wabbit online learning project Langford J, Li L, Strehl A (2007) Vowpal wabbit online learning project
Zurück zum Zitat Monard MC, Batista GE (2002) Learning with skewed class distrihutions. Advances in Logic, Artificial Intelligence, and Robotics: LAPTEC 2002 85:173 Monard MC, Batista GE (2002) Learning with skewed class distrihutions. Advances in Logic, Artificial Intelligence, and Robotics: LAPTEC 2002 85:173
Zurück zum Zitat Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the International Conference on Advances in Computing, Communication and Control, pp 202–207. ACM Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the International Conference on Advances in Computing, Communication and Control, pp 202–207. ACM
Zurück zum Zitat Nakashima T, Oyama M, Hisada H, Ishii N (1999) Analysis of software bug causes and its prevention. Inf Softw Technol 41(15):1059–1068CrossRef Nakashima T, Oyama M, Hisada H, Ishii N (1999) Analysis of software bug causes and its prevention. Inf Softw Technol 41(15):1059–1068CrossRef
Zurück zum Zitat Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 70–79. ACM Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 70–79. ACM
Zurück zum Zitat Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora
Zurück zum Zitat Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. NIST SPECIAL PUBLICATION SP:109–109 Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. NIST SPECIAL PUBLICATION SP:109–109
Zurück zum Zitat Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: 2007 29th International Conference on Software Engineering, ICSE 2007, pp 499–510. IEEE Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: 2007 29th International Conference on Software Engineering, ICSE 2007, pp 499–510. IEEE
Zurück zum Zitat Serrano N, Ciordia I (2005) Bugzilla, ITracker, and other bug trackers. IEEE Softw 22(2):11–13CrossRef Serrano N, Ciordia I (2005) Bugzilla, ITracker, and other bug trackers. IEEE Softw 22(2):11–13CrossRef
Zurück zum Zitat Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp 253–262. IEEE Computer Society Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp 253–262. IEEE Computer Society
Zurück zum Zitat Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp 45–54. ACM Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp 45–54. ACM
Zurück zum Zitat Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp 366–374. IEEE Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: 2010 17th Asia Pacific Software Engineering Conference (APSEC), pp 366–374. IEEE
Zurück zum Zitat Hulse JV, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning, pp 935–942. ACM Hulse JV, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning, pp 935–942. ACM
Zurück zum Zitat Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: ICDM, pp 695–704 Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: ICDM, pp 695–704
Zurück zum Zitat Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp 754–763. IEEE Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Class imbalance, redux. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp 754–763. IEEE
Zurück zum Zitat Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on Software engineering, pp 461–470. ACM Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on Software engineering, pp 461–470. ACM
Zurück zum Zitat Zaragoza H, Craswell N, Taylor MJ, Saria S, Robertson SE (2004) Microsoft cambridge at trec 13: Web and hard tracks. In: TREC, vol, 4, pp 1–1. Citeseer Zaragoza H, Craswell N, Taylor MJ, Saria S, Robertson SE (2004) Microsoft cambridge at trec 13: Web and hard tracks. In: TREC, vol, 4, pp 1–1. Citeseer
Metadaten
Titel
A contextual approach towards more accurate duplicate bug report detection and ranking
verfasst von
Abram Hindle
Anahita Alipour
Eleni Stroulia
Publikationsdatum
01.04.2016
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2016
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-015-9387-3

Weitere Artikel der Ausgabe 2/2016

Empirical Software Engineering 2/2016 Zur Ausgabe

Premium Partner