Skip to main content
Erschienen in: Empirical Software Engineering 2/2018

01.12.2017

Aggregating Association Rules to Improve Change Recommendation

verfasst von: Thomas Rolfsnes, Leon Moonen, Stefano Di Alesio, Razieh Behjati, Dave Binkley

Erschienen in: Empirical Software Engineering | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the complexity of software systems grows, it becomes increasingly difficult for developers to be aware of all the dependencies that exist between artifacts (e.g., files or methods) of a system. Change recommendation has been proposed as a technique to overcome this problem, as it suggests to a developer relevant source-code artifacts related to her changes. Association rule mining has shown promise in deriving such recommendations by uncovering relevant patterns in the system’s change history. The strength of the mined association rules is captured using a variety of interestingness measures. However, state-of-the-art recommendation engines typically use only the rule with the highest interestingness value when more than one rule applies. In contrast, we argue that when multiple rules apply, this indicates collective evidence, and aggregating those rules (and their evidence) will lead to more accurate change recommendation. To investigate this hypothesis we conduct a large empirical study of 15 open source software systems and two systems from our industry partners. We evaluate association rule aggregation using four variants of the change history for each system studied, enabling us to compare two different levels of granularity in two different scenarios. Furthermore, we study 40 interestingness measures using the rules produced by two different mining algorithms. The results show that (1) between 13 and 90% of change recommendations can be improved by rule aggregation, (2) rule aggregation almost always improves change recommendation for both algorithms and all measures, and (3) fine-grained histories benefit more from rule aggregation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Other levels of granularity are possible as our algorithms are granularity agnostic. Thus, our initial description at the file level is without loss of generality. Provided suitably co-change data the algorithms can relate methods or variables just as well as files, a fact which will be exploited later on in the paper.
 
2
The three measures are: descriptive confirmed confidence, example and counterexample rate, and least contradictions. Other able measures also sometimes produced negative values, although quite rarely.
 
3
Formal proofs for the three aggregator functions are provided in the Appendix.
 
4
For a normally distributed population of 50 000, a minimum of 657 samples is required to attain 99% confidence with a 5% confidence interval that the sampled transactions are representative of the population. Since we do not know the distribution of transactions, we correct the sample size to the number needed for a non-parametric test to have the same ability to reject the null hypothesis. This correction is done using the Asymptotic Relative Efficiency (ARE). As AREs differ for various non-parametric tests, we choose the lowest coefficient, 0.637, yielding a conservative minimum sample size of 657/0.637 = 1032 transactions. Hence, a sample size of 1100 is more than sufficient to attain 99% confidence with a 5% confidence interval that the samples are representative of the population.
 
5
Exceptions are the descriptive confirmed confidence and example and counterexample rate, where aggregation was also found to have a non-significant effect in Fig. 4.
 
Literatur
Zurück zum Zitat Azė J, Kodratoff Y (2002) Evaluation de la résistance au bruit de quelques mesures d’extraction de règles d’association. In: Extraction et gestion des connaissances (EGC), vol 1. Hermes Science Publications, pp 143–154 Azė J, Kodratoff Y (2002) Evaluation de la résistance au bruit de quelques mesures d’extraction de règles d’association. In: Extraction et gestion des connaissances (EGC), vol 1. Hermes Science Publications, pp 143–154
Zurück zum Zitat Ball T, Kim J, Siy HP (1997) If your version control system could talk. In: Workshop on Process Modelling and Empirical Studies of Software Engineering, ICSE. 10.1.1.48.910 Ball T, Kim J, Siy HP (1997) If your version control system could talk. In: Workshop on Process Modelling and Empirical Studies of Software Engineering, ICSE. 10.1.1.48.910
Zurück zum Zitat Bernard JM, Charron C (1996) Bayesian implicative analysis, a method for the study of oriented dependencies. Mathématiques. Informatique et Sci Humaines 135:5–18MATH Bernard JM, Charron C (1996) Bayesian implicative analysis, a method for the study of oriented dependencies. Mathématiques. Informatique et Sci Humaines 135:5–18MATH
Zurück zum Zitat Bohner S, Arnold R (1996) Software change impact analysis. IEEE, CA, USA Bohner S, Arnold R (1996) Software change impact analysis. IEEE, CA, USA
Zurück zum Zitat Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees, vol. 19 Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees, vol. 19
Zurück zum Zitat Collard ML, Decker MJ, Maletic JI (2013) srcML: an infrastructure for the exploration, analysis, and manipulation of source code: a tool demonstration. In: IEEE International conference on software maintenance (ICSM). IEEE, pp 516–519. https://doi.org/10.1109/ICSM.2013.85 Collard ML, Decker MJ, Maletic JI (2013) srcML: an infrastructure for the exploration, analysis, and manipulation of source code: a tool demonstration. In: IEEE International conference on software maintenance (ICSM). IEEE, pp 516–519. https://​doi.​org/​10.​1109/​ICSM.​2013.​85
Zurück zum Zitat Eick S, Graves TL, Karr A, Marron J, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12. 10.1109/32.895984CrossRef Eick S, Graves TL, Karr A, Marron J, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12. 10.1109/32.895984CrossRef
Zurück zum Zitat Good IJ (1966) The estimation of probabilities: an essay on modern Bayesian methods. MIT Press Good IJ (1966) The estimation of probabilities: an essay on modern Bayesian methods. MIT Press
Zurück zum Zitat Jorge AM, Azevedo PJ (2005) An experiment with association rules and classification: post-bagging and conviction. In: Hoffmann A, Motoda H, Scheffer T (eds) Proceedings of the 8th International Conference on Discovery Science DS 2005, Lecture Notes in Computer Science, vol 3735. Springer, Berlin, pp 137–149. https://doi.org/10.1007/11563983_13 Jorge AM, Azevedo PJ (2005) An experiment with association rules and classification: post-bagging and conviction. In: Hoffmann A, Motoda H, Scheffer T (eds) Proceedings of the 8th International Conference on Discovery Science DS 2005, Lecture Notes in Computer Science, vol 3735. Springer, Berlin, pp 137–149. https://​doi.​org/​10.​1007/​11563983_​13
Zurück zum Zitat Kamber M, Shinghal R (1996) Evaluating the interestingness of characteristic rules. In: SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 263–266 Kamber M, Shinghal R (1996) Evaluating the interestingness of characteristic rules. In: SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 263–266
Zurück zum Zitat Kannan S, Bhaskaran R (2009) Association rule pruning based on interestingness measures with clustering. J Comput Sci 6(1):35–43 Kannan S, Bhaskaran R (2009) Association rule pruning based on interestingness measures with clustering. J Comput Sci 6(1):35–43
Zurück zum Zitat Kulczyński S (1928) Die Pflanzenassoziationen der Pieninen Imprimerie de l’université Kulczyński S (1928) Die Pflanzenassoziationen der Pieninen Imprimerie de l’université
Zurück zum Zitat Moonen L, Di Alesio S, Rolfsnes T, Binkley DW (2016) Exploring the effects of history length and age on mining software change impact. In: IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), pp 207–216. https://doi.org/10.1109/SCAM.2016.9 Moonen L, Di Alesio S, Rolfsnes T, Binkley DW (2016) Exploring the effects of history length and age on mining software change impact. In: IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), pp 207–216. https://​doi.​org/​10.​1109/​SCAM.​2016.​9
Zurück zum Zitat Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases pp 229—-238 Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases pp 229—-238
Zurück zum Zitat Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: a tool for change impact analysis of java programs. In: ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp 432–448. https://doi.org/10.1145/1035292.1029012 Ren X, Shah F, Tip F, Ryder BG, Chesley O (2004) Chianti: a tool for change impact analysis of java programs. In: ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp 432–448. https://​doi.​org/​10.​1145/​1035292.​1029012
Zurück zum Zitat Rolfsnes T, Di Alesio S, Behjati R, Moonen L, Binkley DW (2016) Generalizing the analysis of evolutionary coupling for software change impact analysis. In: International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, pp 201–212. https://doi.org/10.1109/SANER.2016.101 Rolfsnes T, Di Alesio S, Behjati R, Moonen L, Binkley DW (2016) Generalizing the analysis of evolutionary coupling for software change impact analysis. In: International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, pp 201–212. https://​doi.​org/​10.​1109/​SANER.​2016.​101
Zurück zum Zitat Rosenthal R (1991) Meta-analytic procedures for social research. SAGE Rosenthal R (1991) Meta-analytic procedures for social research. SAGE
Zurück zum Zitat Sebag M, Schoenauer M (1988) Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Proceedings of the european knowledge acquisition workshop (EKAW), p 28 Sebag M, Schoenauer M (1988) Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Proceedings of the european knowledge acquisition workshop (EKAW), p 28
Zurück zum Zitat Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: International Conference on Knowledge Discovery and Data Mining (KDD). AASI, pp 67–73 Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: International Conference on Knowledge Discovery and Data Mining (KDD). AASI, pp 67–73
Zurück zum Zitat Toivonen H, Klemettinen M, Ronkainen P, Hätönen K, Mannila H (1995) Pruning and grouping discovered association rules. In: Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pp 47–52 Toivonen H, Klemettinen M, Ronkainen P, Hätönen K, Mannila H (1995) Pruning and grouping discovered association rules. In: Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pp 47–52
Zurück zum Zitat Van Rijsbergen CJ (1979) Information retrieval. Butterworth-Heinemann Van Rijsbergen CJ (1979) Information retrieval. Butterworth-Heinemann
Zurück zum Zitat Yule GU (1900) On the association of attributes in statistics. Philos Trans R Soc Lond 194:257–319CrossRefMATH Yule GU (1900) On the association of attributes in statistics. Philos Trans R Soc Lond 194:257–319CrossRefMATH
Metadaten
Titel
Aggregating Association Rules to Improve Change Recommendation
verfasst von
Thomas Rolfsnes
Leon Moonen
Stefano Di Alesio
Razieh Behjati
Dave Binkley
Publikationsdatum
01.12.2017
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9560-y

Weitere Artikel der Ausgabe 2/2018

Empirical Software Engineering 2/2018 Zur Ausgabe

Premium Partner