Skip to main content
Erschienen in: Empirical Software Engineering 4/2018

06.03.2018

What are the effects of history length and age on mining software change impact?

verfasst von: Leon Moonen, Thomas Rolfsnes, Dave Binkley, Stefano Di Alesio

Erschienen in: Empirical Software Engineering | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The goal of Software Change Impact Analysis is to identify artifacts (typically source-code files or individual methods therein) potentially affected by a change. Recently, there has been increased interest in mining software change impact based on evolutionary coupling. A particularly promising approach uses association rule mining to uncover potentially affected artifacts from patterns in the system’s change history. Two main considerations when using this approach are the history length, the number of transactions from the change history used to identify the impact of a change, and history age, the number of transactions that have occurred since patterns were last mined from the history. Although history length and age can significantly affect the quality of mining results, few guidelines exist on how to best select appropriate values for these two parameters. In this paper, we empirically investigate the effects of history length and age on the quality of change impact analysis using mined evolutionary coupling. Specifically, we report on a series of systematic experiments using three state-of-the-art mining algorithms that involve the change histories of two large industrial systems and 17 large open source systems. In these experiments, we vary the length and age of the history used to mine software change impact, and assess how this affects precision and applicability. Results from the study are used to derive practical guidelines for choosing history length and age when applying association rule mining to conduct software change impact analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that various granularity choices are possible since the algorithms are granularity agnostic; if fine-grained co-change data is available (or computable), the same algorithms will relate methods or variables just as well as more coarse-grained files. In this paper we consider a practical fine-grained level that uses method-level information where possible (i.e., for source files that can be parsed, as discussed later in the paper), and file-level information otherwise (e.g., for test plans, build files, and configuration files).
 
2
For a normally distributed population of 50 000, a minimum of 657 samples is required to attain 99% confidence with a 5% confidence interval that the sampled transactions are representative of the population. Since we do not know the distribution of transactions, we correct the sample size to the number needed for a non-parametric test to have the same ability to reject the null hypothesis. This correction is done using the Asymptotic Relative Efficiency (ARE). As AREs differ for various non-parametric tests, we choose the lowest coefficient, 0.637, yielding a conservative minimum sample size of 657/0.637 = 1032 transactions. Hence, a sample size of 1100 is more than sufficient to attain 99% confidence with a 5% confidence interval that the samples are representative of the population.
 
3
Observe that the MAP values in the three subtables of Table 9 were obtained from three different randomly sampled collections, which explains the variation in MAPs for ages repeated in different collections. Although these values are within the 5% confidence interval targeted by our sampling approach, it still means that cutoff values obtained from one collection cannot be used to look up corresponding ages in another collection, as also shown by the values for age2000 and age200 in Table 9.
 
Literatur
Zurück zum Zitat Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data. ACM, pp 207–216 Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data. ACM, pp 207–216
Zurück zum Zitat Alali A (2008) An empirical characterization of commits in software repositories. Ms.c. Kent State University, 53 Alali A (2008) An empirical characterization of commits in software repositories. Ms.c. Kent State University, 53
Zurück zum Zitat Alali A, Kagdi H, Maletic JI (2008) What’s a typical commit? A characterization of open source software repositories. In: International conference on program comprehension (ICPC). IEEE, pp 182–191 Alali A, Kagdi H, Maletic JI (2008) What’s a typical commit? A characterization of open source software repositories. In: International conference on program comprehension (ICPC). IEEE, pp 182–191
Zurück zum Zitat Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM, p 513 Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM, p 513
Zurück zum Zitat Bohner S, Arnold R (1996) Software change impact analysis. IEEE, USA Bohner S, Arnold R (1996) Software change impact analysis. IEEE, USA
Zurück zum Zitat Canfora G, Cerulo L (2005) Impact analysis by mining software and change request repositories. In: International software metrics symposium (METRICS). IEEE, pp 29–37 Canfora G, Cerulo L (2005) Impact analysis by mining software and change request repositories. In: International software metrics symposium (METRICS). IEEE, pp 29–37
Zurück zum Zitat Eick S et al (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12CrossRef Eick S et al (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12CrossRef
Zurück zum Zitat Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 190–198 Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 190–198
Zurück zum Zitat German DM (2006) An empirical study of fine-grained software modifications. Empir Softw Eng 11(3):369–393CrossRef German DM (2006) An empirical study of fine-grained software modifications. Empir Softw Eng 11(3):369–393CrossRef
Zurück zum Zitat Gethers M et al (2011) An adaptive approach to impact analysis from change requests to source code. In: IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 540–543 Gethers M et al (2011) An adaptive approach to impact analysis from change requests to source code. In: IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 540–543
Zurück zum Zitat Graves T L et al (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRef Graves T L et al (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRef
Zurück zum Zitat Hassan AE (2008) The road ahead for Mining Software Repositories. In: Frontiers of software maintenance. IEEE, pp 48–57 Hassan AE (2008) The road ahead for Mining Software Repositories. In: Frontiers of software maintenance. IEEE, pp 48–57
Zurück zum Zitat Hassan AE, Holt R (2004) Predicting change propagation in software systems. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 284–293 Hassan AE, Holt R (2004) Predicting change propagation in software systems. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 284–293
Zurück zum Zitat Jaafar F et al (2014) Detecting asynchrony and dephase change patterns by mining software repositories. J Softw: Evol Process 26(1):77–106 Jaafar F et al (2014) Detecting asynchrony and dephase change patterns by mining software repositories. J Softw: Evol Process 26(1):77–106
Zurück zum Zitat Jashki M-A, Zafarani R, Bagheri E (2008) Towards a more efficient static software change impact analysis method. In: ACM SIGPLAN-SIGSOFT workshop on program analysis for software tools and engineering (PASTE). ACM, pp 84–90 Jashki M-A, Zafarani R, Bagheri E (2008) Towards a more efficient static software change impact analysis method. In: ACM SIGPLAN-SIGSOFT workshop on program analysis for software tools and engineering (PASTE). ACM, pp 84–90
Zurück zum Zitat Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. ACM SIGMOD Rec 35(1):14–19CrossRef Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. ACM SIGMOD Rec 35(1):14–19CrossRef
Zurück zum Zitat Kagdi H, Yusuf S, Maletic JI (2006) Mining sequences of changed-files from version histories. In: International workshop on mining software repositories (MSR). ACM, pp 47–53 Kagdi H, Yusuf S, Maletic JI (2006) Mining sequences of changed-files from version histories. In: International workshop on mining software repositories (MSR). ACM, pp 47–53
Zurück zum Zitat Kagdi H, Gethers M, Poshyvanyk D (2013) Integrating conceptual and logical couplings for change impact analysis in software. Empir Softw Eng 18(5):933–969CrossRef Kagdi H, Gethers M, Poshyvanyk D (2013) Integrating conceptual and logical couplings for change impact analysis in software. Empir Softw Eng 18(5):933–969CrossRef
Zurück zum Zitat Kolassa C, Riehle D, Salim MA (2013) The empirical commit frequency distribution of open source projects. In: International Symposium On Open Collaboration (WikiSym). ACM, pp 1–8 Kolassa C, Riehle D, Salim MA (2013) The empirical commit frequency distribution of open source projects. In: International Symposium On Open Collaboration (WikiSym). ACM, pp 1–8
Zurück zum Zitat Law J, Rothermel G (2003) Whole program path-based dynamic impact analysis. In: International conference on software engineering (ICSE). IEEE, pp 308–318 Law J, Rothermel G (2003) Whole program path-based dynamic impact analysis. In: International conference on software engineering (ICSE). IEEE, pp 308–318
Zurück zum Zitat Lin W, Alvarez SA, Ruiz C (2002) Efficient adaptive-support association rule mining for recommender systems. Data Min Knowl Disc 6(1):83–105MathSciNetCrossRef Lin W, Alvarez SA, Ruiz C (2002) Efficient adaptive-support association rule mining for recommender systems. Data Min Knowl Disc 6(1):83–105MathSciNetCrossRef
Zurück zum Zitat Maimon O, Rokach L (1383) In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin Maimon O, Rokach L (1383) In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin
Zurück zum Zitat Moonen L et al (2016a) Exploring the effects of history length and age on mining software change impact. In: IEEE international working conference on source code analysis and manipulation (SCAM), pp 207– 216 Moonen L et al (2016a) Exploring the effects of history length and age on mining software change impact. In: IEEE international working conference on source code analysis and manipulation (SCAM), pp 207– 216
Zurück zum Zitat Moonen L et al (2016b) Practical guidelines for change recommendation using association rule mining. In: International conference on automated software engineering (ASE). ACM, pp 732–743 Moonen L et al (2016b) Practical guidelines for change recommendation using association rule mining. In: International conference on automated software engineering (ASE). ACM, pp 732–743
Zurück zum Zitat Podgurski A, Clarke L (1990) A formal model of program dependences and its implications for software testing, debugging, and maintenance. IEEE Trans Softw Eng 16(9):965–979CrossRef Podgurski A, Clarke L (1990) A formal model of program dependences and its implications for software testing, debugging, and maintenance. IEEE Trans Softw Eng 16(9):965–979CrossRef
Zurück zum Zitat Ren X et al (2004) Chianti: a tool for change impact analysis of java programs. In: ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications (OOPSLA), pp 432–448 Ren X et al (2004) Chianti: a tool for change impact analysis of java programs. In: ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications (OOPSLA), pp 432–448
Zurück zum Zitat Robbes R, Pollet D, Lanza M (2008) Logical coupling based on fine- grained change information. In: Working conference on reverse engineering (WCRE). IEEE, pp 42–46 Robbes R, Pollet D, Lanza M (2008) Logical coupling based on fine- grained change information. In: Working conference on reverse engineering (WCRE). IEEE, pp 42–46
Zurück zum Zitat Rolfsnes T et al (2016a) Generalizing the analysis of evolutionary coupling for software change impact analysis. In: International conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 201–212 Rolfsnes T et al (2016a) Generalizing the analysis of evolutionary coupling for software change impact analysis. In: International conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 201–212
Zurück zum Zitat Rolfsnes T et al (2016b) Improving change recommendation using aggregated association rules. In: International conference on mining software repositories (MSR). ACM, pp 73–84 Rolfsnes T et al (2016b) Improving change recommendation using aggregated association rules. In: International conference on mining software repositories (MSR). ACM, pp 73–84
Zurück zum Zitat Schuirmann D (1981) On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics Schuirmann D (1981) On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics
Zurück zum Zitat Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: International conference on knowledge discovery and data mining (KDD). AASI, pp 67–73 Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: International conference on knowledge discovery and data mining (KDD). AASI, pp 67–73
Zurück zum Zitat Westlake W (1981) Response to T.B.L. Kirkwood: bioequivalence testing—a need to rethink. Biometrics 37:589–594CrossRef Westlake W (1981) Response to T.B.L. Kirkwood: bioequivalence testing—a need to rethink. Biometrics 37:589–594CrossRef
Zurück zum Zitat Yazdanshenas AR, Moonen L (2011) Crossing the boundaries while analyzing heterogeneous component-based software systems. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 193–202 Yazdanshenas AR, Moonen L (2011) Crossing the boundaries while analyzing heterogeneous component-based software systems. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 193–202
Zurück zum Zitat Ying ATT et al (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586CrossRef Ying ATT et al (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586CrossRef
Zurück zum Zitat Zanjani M B, Swartzendruber G, Kagdi H (2014) Impact analysis of change requests on source code based on interaction and commit histories. In: International working conference on mining software repositories (MSR), pp 162–171 Zanjani M B, Swartzendruber G, Kagdi H (2014) Impact analysis of change requests on source code based on interaction and commit histories. In: International working conference on mining software repositories (MSR), pp 162–171
Zurück zum Zitat Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 401–406 Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 401–406
Zurück zum Zitat Zimmermann T et al (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445CrossRef Zimmermann T et al (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445CrossRef
Metadaten
Titel
What are the effects of history length and age on mining software change impact?
verfasst von
Leon Moonen
Thomas Rolfsnes
Dave Binkley
Stefano Di Alesio
Publikationsdatum
06.03.2018
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 4/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9588-z

Weitere Artikel der Ausgabe 4/2018

Empirical Software Engineering 4/2018 Zur Ausgabe

Premium Partner