Skip to main content
Erschienen in: Empirical Software Engineering 2/2021

01.03.2021

Comparing the results of replications in software engineering

verfasst von: Adrian Santos, Sira Vegas, Markku Oivo, Natalia Juristo

Erschienen in: Empirical Software Engineering | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Context

It has been argued that software engineering replications are useful for verifying the results of previous experiments. However, it has not yet been agreed how to check whether the results hold across replications. Besides, some authors suggest that replications that do not verify the results of previous experiments can be used to identify contextual variables causing the discrepancies.

Objective

Study how to assess the (dis)similarity of the results of SE replications when they are compared to verify the results of previous experiments and understand how to identify whether contextual variables are influencing results.

Method

We run simulations to learn how different ways of comparing replication results behave when verifying the results of previous experiments. We illustrate how to deal with context-induced changes. To do this, we analyze three groups of replications from our own research on test-driven development and testing techniques.

Results

The direct comparison of p-values and effect sizes does not appear to be suitable for verifying the results of previous experiments and examining the variables possibly affecting the results in software engineering. Analytical methods such as meta-analysis should be used to assess the similarity of software engineering replication results and identify discrepancies in results.

Conclusion

The results achieved in baseline experiments should no longer be regarded as a result that needs to be reproduced, but as a small piece of evidence within a larger picture that only emerges after assembling many small pieces to complete the puzzle.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Refer to Camerer et al. (2018) for an overview of the different approaches that have been proposed in different branches of science to check whether or not the results hold across the replications.
 
2
We use this example for illustrative purposes, although we are unlikely to ever be able establish this fact, unless we conduct a large enough number of experiments so as to sample the whole population (Ioannidis 2005; Cumming 2013; Camerer et al. 2018).
 
3
Cohen’s d\(=\frac {57.49-51.42}{\sqrt {(9.73^{2}+8.30^{2})/2}}=0.67\).
 
4
Note that the results of experiments with larger sample sizes are more like the findings for the population.
 
5
When we refer to “replications that estimate an identical true effect size”, we do it in the same terms as Borenstein et al. (2011), where the authors refer to a situation in which all factors that could influence the effect size are the same in all the studies, and thus, the true effect size is the same, since all differences in the estimated effects are due to sampling error.
 
6
Note that the experimental design does not affect the value of the estimated effect size, it influences the ability to detect the true effect size, which is known as power.
 
7
Note, however, that the true effect size is just as important, and, in actual fact, given the problems with questionable research practices and publication bias, a small experiment will most likely overestimate the effect size because it is impossible for a small effect size to be significant.
 
8
We are aware that SE data may not be normal (Kitchenham et al. 2017; Arcuri and Briand 2011). However, we opted to use normal distributions as they are a convenient way of expressing the true effect size in the population in terms of Cohen’s d (Borenstein et al. 2011). We decided to express the effect size using Cohen’s d because of its common use in SE (Kampenes et al. 2007). We discuss the shortcomings of simulating normally distributed data in the threats to validity section.
 
9
Cohen’s d \(=\frac {58-50}{10}=0.8\) (Cumming 2013).
 
10
I2 is interpreted as: 25% low, 50% medium and 75% high (Higgins et al. 2003).
 
11
Note, however, that there are other alternatives to random-effects meta-analysis. Empirical Bayes, for instance, has the advantage of being more explicit and using a better approximation algorithm.
 
12
For a detailed description of the experiments, their designs, and results please refer to Juristo et al. (2012).
 
13
A population of novice testers with limited experience in software development, 12 hours of training on testing techniques, testing toy programs.
 
14
For a detailed description of the experiments, their designs, and results please refer to Tosun et al. (2017).
 
15
The survey and its results were published elsewhere (Dieste et al. 2017).
 
16
For simplicity’s sake, we consider the variables measured throughout the survey as continuous as in Dieste et al. (2017).
 
17
This estimate was calculated based on the output of the meta-analysis that we undertook with the metafor R package (Viechtbauer 2010).
 
18
τ2 can be estimated with different estimation methods, each of which may provide a potentially different estimate (Langan et al. 2018). In this article, we use restricted maximum likelihood (REML), recommended by Langan et al., applicable to continuous outcomes (Langan et al. 2018). A large number of experiments are needed to estimate precise τ2 parameters (i.e., 5 (Feaster et al. 2011), 10 (Snijders 2011), 15 or even more (McNeish and Stapleton 2016)).
 
19
Note that this a sub-optimal approach because of the threat of introducing heterogeneity due to unacknowledged variables. It is better to conduct fewer larger studies. Very often, however, the only option is to run several small studies.
 
20
Unfortunately, there are no hard-and-fast rules for establishing how many replications are enough (Borenstein et al. 2011). This is because the precision of the results may be affected by the distribution of sample sizes across the replications (Ruvuna 2004), the experimental design of the replications (Morris and DeShon 2002), the variability of the data (Cumming 2013), etc.
 
Literatur
Zurück zum Zitat Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd International conference on software engineering (ICSE). IEEE, pp 1–10 Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd International conference on software engineering (ICSE). IEEE, pp 1–10
Zurück zum Zitat Badampudi D, Wohlin C, Gorschek T (2019) Contextualizing research evidence through knowledge translation in software engineering. In: Proceedings of the evaluation and assessment on software engineering, EASE 2019, Copenhagen, Denmark, April 15–17, 2019. ACM, pp 306–311 Badampudi D, Wohlin C, Gorschek T (2019) Contextualizing research evidence through knowledge translation in software engineering. In: Proceedings of the evaluation and assessment on software engineering, EASE 2019, Copenhagen, Denmark, April 15–17, 2019. ACM, pp 306–311
Zurück zum Zitat Baker M (2016) Is there a reproducibility crisis? a nature survey lifts the lid on how researchers view the’crisis rocking science and what they think will help. Nature 533(7604):452–455CrossRef Baker M (2016) Is there a reproducibility crisis? a nature survey lifts the lid on how researchers view the’crisis rocking science and what they think will help. Nature 533(7604):452–455CrossRef
Zurück zum Zitat Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473CrossRef Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473CrossRef
Zurück zum Zitat Beck K (2003) Test-driven development: by example. Addison-Wesley Professional, Boston Beck K (2003) Test-driven development: by example. Addison-Wesley Professional, Boston
Zurück zum Zitat Bezerra RM, da Silva FQ, Santana AM, Magalhaes CV, Santos RE (2015) Replication of empirical studies in software engineering: an update of a systematic mapping study. In: Proceedings of the 2015 9th international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 1–4 Bezerra RM, da Silva FQ, Santana AM, Magalhaes CV, Santos RE (2015) Replication of empirical studies in software engineering: an update of a systematic mapping study. In: Proceedings of the 2015 9th international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 1–4
Zurück zum Zitat Biondi-Zoccai G (2016) Umbrella reviews: evidence synthesis with overviews of reviews and meta-epidemiologic studies. Springer, BerlinCrossRef Biondi-Zoccai G (2016) Umbrella reviews: evidence synthesis with overviews of reviews and meta-epidemiologic studies. Springer, BerlinCrossRef
Zurück zum Zitat Borenstein M, Hedges LV, Higgins JP, Rothstein HR (2011) Introduction to meta-analysis. Wiley, ChichesterMATH Borenstein M, Hedges LV, Higgins JP, Rothstein HR (2011) Introduction to meta-analysis. Wiley, ChichesterMATH
Zurück zum Zitat Briand L, Bianculli D, Nejati S, Pastore F, Sabetzadeh M (2017) The case for context-driven software engineering research: generalizability is overrated. IEEE Softw 34(5):72–75CrossRef Briand L, Bianculli D, Nejati S, Pastore F, Sabetzadeh M (2017) The case for context-driven software engineering research: generalizability is overrated. IEEE Softw 34(5):72–75CrossRef
Zurück zum Zitat Brooks A, Roper M, Wood M, Daly J, Miller J (2003) Replication of software engineering experiments. Empirical Foundations of Computer Science Technical Report, EfoCS-51-2003. Department of Computer and Information Sciences University of Strathclyde Brooks A, Roper M, Wood M, Daly J, Miller J (2003) Replication of software engineering experiments. Empirical Foundations of Computer Science Technical Report, EfoCS-51-2003. Department of Computer and Information Sciences University of Strathclyde
Zurück zum Zitat Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14(5):365CrossRef Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14(5):365CrossRef
Zurück zum Zitat Camerer CF, Dreber A, Holzmeister F, Ho TH, Huber J, Johannesson M, Kirchler M, Nave G, Nosek BA, Pfeiffer T et al (2018) Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nat Hum Behav 2(9):637–644CrossRef Camerer CF, Dreber A, Holzmeister F, Ho TH, Huber J, Johannesson M, Kirchler M, Nave G, Nosek BA, Pfeiffer T et al (2018) Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nat Hum Behav 2(9):637–644CrossRef
Zurück zum Zitat Cohen J (1988) Statistical power analysis for the behavioral sciences. Lawrence Earlbaum Associates, Hillsdale, pp 20–26MATH Cohen J (1988) Statistical power analysis for the behavioral sciences. Lawrence Earlbaum Associates, Hillsdale, pp 20–26MATH
Zurück zum Zitat Cumming G (2013) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, New YorkCrossRef Cumming G (2013) Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. Routledge, New YorkCrossRef
Zurück zum Zitat Cumming G (2014) The new statistics: why and how. Psychol Sci 25(1):7–29CrossRef Cumming G (2014) The new statistics: why and how. Psychol Sci 25(1):7–29CrossRef
Zurück zum Zitat Da Silva FQ, Suassuna M, França A C C, Grubb AM, Gouveia TB, Monteiro CV, dos Santos IE (2014) Replication of empirical studies in software engineering research: a systematic mapping study. Empir Softw Eng 19 (3):501–557 Da Silva FQ, Suassuna M, França A C C, Grubb AM, Gouveia TB, Monteiro CV, dos Santos IE (2014) Replication of empirical studies in software engineering research: a systematic mapping study. Empir Softw Eng 19 (3):501–557
Zurück zum Zitat de França B B N, Travassos GH (2016) Experimentation with dynamic simulation models in software engineering: planning and reporting guidelines. Empir Softw Eng 21(3):1302–1345CrossRef de França B B N, Travassos GH (2016) Experimentation with dynamic simulation models in software engineering: planning and reporting guidelines. Empir Softw Eng 21(3):1302–1345CrossRef
Zurück zum Zitat de Magalhães C V, da Silva FQ, Santos RE, Suassuna M (2015) Investigations about replication of empirical studies in software engineering: a systematic mapping study. Inf Softw Technol 64:76–101CrossRef de Magalhães C V, da Silva FQ, Santos RE, Suassuna M (2015) Investigations about replication of empirical studies in software engineering: a systematic mapping study. Inf Softw Technol 64:76–101CrossRef
Zurück zum Zitat Dieste O, Aranda AM, Uyaguari F, Turhan B, Tosun A, Fucci D, Oivo M, Juristo N (2017) Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empir Softw Eng 22(5):2457–2542CrossRef Dieste O, Aranda AM, Uyaguari F, Turhan B, Tosun A, Fucci D, Oivo M, Juristo N (2017) Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empir Softw Eng 22(5):2457–2542CrossRef
Zurück zum Zitat Duran J, Ntafos S (1984) An evaluation of random testing. IEEE Trans Softw Eng SE-10(4):438–444CrossRef Duran J, Ntafos S (1984) An evaluation of random testing. IEEE Trans Softw Eng SE-10(4):438–444CrossRef
Zurück zum Zitat Dybå T, Kampenes VB, Sjøberg DI (2006) A systematic review of statistical power in software engineering experiments. Inf Softw Technol 48(8):745–755CrossRef Dybå T, Kampenes VB, Sjøberg DI (2006) A systematic review of statistical power in software engineering experiments. Inf Softw Technol 48(8):745–755CrossRef
Zurück zum Zitat Egger M, Davey-Smith G, Altman D (2008) Systematic reviews in health care: meta-analysis in context. Wiley, New York Egger M, Davey-Smith G, Altman D (2008) Systematic reviews in health care: meta-analysis in context. Wiley, New York
Zurück zum Zitat Ellis PD (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press, CambridgeCrossRef Ellis PD (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press, CambridgeCrossRef
Zurück zum Zitat Feaster DJ, Mikulich-Gilbertson S, Brincks AM (2011) Modeling site effects in the design and analysis of multi-site trials. Am J Drug Alcohol Abuse 37(5):383–391CrossRef Feaster DJ, Mikulich-Gilbertson S, Brincks AM (2011) Modeling site effects in the design and analysis of multi-site trials. Am J Drug Alcohol Abuse 37(5):383–391CrossRef
Zurück zum Zitat Field A (2013) Discovering statistics using IBM SPSS statistics. Sage, Thousand Oaks Field A (2013) Discovering statistics using IBM SPSS statistics. Sage, Thousand Oaks
Zurück zum Zitat Fisher D, Copas A, Tierney J, Parmar M (2011) A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of randomized trials, and guidance for practitioners. J Clin Epidemiol 64 (9):949–967CrossRef Fisher D, Copas A, Tierney J, Parmar M (2011) A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of randomized trials, and guidance for practitioners. J Clin Epidemiol 64 (9):949–967CrossRef
Zurück zum Zitat Gagnier JJ, Moher D, Boon H, Beyene J, Bombardier C (2012) Investigating clinical heterogeneity in systematic reviews: a methodologic review of guidance in the literature. BMC Med Res Methodol 12(1):111CrossRef Gagnier JJ, Moher D, Boon H, Beyene J, Bombardier C (2012) Investigating clinical heterogeneity in systematic reviews: a methodologic review of guidance in the literature. BMC Med Res Methodol 12(1):111CrossRef
Zurück zum Zitat Gnedenko BV (2020) Theory of probability, 6th edn. CRC Press, Boca Raton Gnedenko BV (2020) Theory of probability, 6th edn. CRC Press, Boca Raton
Zurück zum Zitat Gómez O S, Juristo N, Vegas S (2010) Replications types in experimental disciplines. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement. ACM, p 3 Gómez O S, Juristo N, Vegas S (2010) Replications types in experimental disciplines. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement. ACM, p 3
Zurück zum Zitat Gomez OS, Juristo N, Vegas S (2014) Understanding replication of experiments in software engineering: a classification. Inf Softw Technol 56(8):1033–1048CrossRef Gomez OS, Juristo N, Vegas S (2014) Understanding replication of experiments in software engineering: a classification. Inf Softw Technol 56(8):1033–1048CrossRef
Zurück zum Zitat Groenwold RH, Rovers MM, Lubsen J, van der Heijden GJ (2010) Subgroup effects despite homogeneous heterogeneity test results. BMC Med Res Methodol 10(1):43CrossRef Groenwold RH, Rovers MM, Lubsen J, van der Heijden GJ (2010) Subgroup effects despite homogeneous heterogeneity test results. BMC Med Res Methodol 10(1):43CrossRef
Zurück zum Zitat Higgins JP, Green S (2011) Cochrane handbook for systematic reviews of interventions, vol 4. Chichester, Wiley Higgins JP, Green S (2011) Cochrane handbook for systematic reviews of interventions, vol 4. Chichester, Wiley
Zurück zum Zitat Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327(7414):557–560CrossRef Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327(7414):557–560CrossRef
Zurück zum Zitat Höst M, Wohlin C, Thelin T (2005) Experimental context classification: incentives and experience of subjects. In: 27th International conference on software engineering (ICSE 2005), 15–21 May 2005, St. Louis, Missouri, USA. ACM, pp 470–478 Höst M, Wohlin C, Thelin T (2005) Experimental context classification: incentives and experience of subjects. In: 27th International conference on software engineering (ICSE 2005), 15–21 May 2005, St. Louis, Missouri, USA. ACM, pp 470–478
Zurück zum Zitat Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2(8):e124CrossRef Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2(8):e124CrossRef
Zurück zum Zitat Ioannidis J, Patsopoulos N, Rothstein H (2008) Research methodology: reasons or excuses for avoiding meta-analysis in forest plots. BMJ: Br Med J 336 (7658):1413–1415CrossRef Ioannidis J, Patsopoulos N, Rothstein H (2008) Research methodology: reasons or excuses for avoiding meta-analysis in forest plots. BMJ: Br Med J 336 (7658):1413–1415CrossRef
Zurück zum Zitat Jedlitschka A, Ciolkowski M, Pfahl D (2008) Guide to advanced empirical software engineering, chap. Reporting Controlled Experiments in Software Engineering. Springer, Berlin Jedlitschka A, Ciolkowski M, Pfahl D (2008) Guide to advanced empirical software engineering, chap. Reporting Controlled Experiments in Software Engineering. Springer, Berlin
Zurück zum Zitat Jørgensen M, Dybå T, Liestøl K, Sjøberg DI (2016) Incorrect results in software engineering experiments: how to improve research practices. J Syst Softw 116:133–145CrossRef Jørgensen M, Dybå T, Liestøl K, Sjøberg DI (2016) Incorrect results in software engineering experiments: how to improve research practices. J Syst Softw 116:133–145CrossRef
Zurück zum Zitat Juristo N (2016) Once is not enough: why we need replication. In: Menzies T, Williams L, Zimmermann T (eds) Perspectives on data science for software engineering. Morgan Kaufmann Juristo N (2016) Once is not enough: why we need replication. In: Menzies T, Williams L, Zimmermann T (eds) Perspectives on data science for software engineering. Morgan Kaufmann
Zurück zum Zitat Juristo N, Moreno AM (2011) Basics of software engineering experimentation. Springer Science & Business Media Juristo N, Moreno AM (2011) Basics of software engineering experimentation. Springer Science & Business Media
Zurück zum Zitat Juristo N, Vegas S (2009) Using differences among replications of software engineering experiments to gain knowledge. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 356–366 Juristo N, Vegas S (2009) Using differences among replications of software engineering experiments to gain knowledge. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 356–366
Zurück zum Zitat Juristo N, Vegas S (2011) The role of non-exact replications in software engineering experiments. Empir Softw Eng 16(3):295–324CrossRef Juristo N, Vegas S (2011) The role of non-exact replications in software engineering experiments. Empir Softw Eng 16(3):295–324CrossRef
Zurück zum Zitat Juristo N, Vegas S, Solari M, Abrahao S, Ramos I (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. In: 2012 IEEE fifth international conference on software testing, verification and validation. IEEE, pp 330–339 Juristo N, Vegas S, Solari M, Abrahao S, Ramos I (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. In: 2012 IEEE fifth international conference on software testing, verification and validation. IEEE, pp 330–339
Zurück zum Zitat Kampenes VB, Dybå T, Hannay JE, Sjøberg DI (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086CrossRef Kampenes VB, Dybå T, Hannay JE, Sjøberg DI (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086CrossRef
Zurück zum Zitat Kitchenham B (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13(2):219–221CrossRef Kitchenham B (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13(2):219–221CrossRef
Zurück zum Zitat Kitchenham B, Madeyski L, Budgen D, Keung J, Brereton P, Charters S, Gibbs S, Pohthong A (2017) Robust statistical methods for empirical software engineering. Empir Softw Eng 22(2):579–630CrossRef Kitchenham B, Madeyski L, Budgen D, Keung J, Brereton P, Charters S, Gibbs S, Pohthong A (2017) Robust statistical methods for empirical software engineering. Empir Softw Eng 22(2):579–630CrossRef
Zurück zum Zitat Langan D, Higgins JP, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M (2018) A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Research synthesis methods Langan D, Higgins JP, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M (2018) A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Research synthesis methods
Zurück zum Zitat Lau J, Ioannidis JP, Schmid CH (1998) Summing up evidence: one answer is not always enough. Lancet 351(9096):123–127CrossRef Lau J, Ioannidis JP, Schmid CH (1998) Summing up evidence: one answer is not always enough. Lancet 351(9096):123–127CrossRef
Zurück zum Zitat Leandro G (2008) Meta-analysis in medical research: the handbook for the understanding and practice of meta-analysis. Wiley, Hoboken Leandro G (2008) Meta-analysis in medical research: the handbook for the understanding and practice of meta-analysis. Wiley, Hoboken
Zurück zum Zitat Makel MC, Plucker JA, Hegarty B (2012) Replications in psychology research: how often do they really occur? Perspec Psychol Sci 7(6):537–542CrossRef Makel MC, Plucker JA, Hegarty B (2012) Replications in psychology research: how often do they really occur? Perspec Psychol Sci 7(6):537–542CrossRef
Zurück zum Zitat Maxwell SE, Lau MY, Howard GS (2015) Is psychology suffering from a replication crisis? what does “failure to replicate” really mean? Am Psychol 70(6):487CrossRef Maxwell SE, Lau MY, Howard GS (2015) Is psychology suffering from a replication crisis? what does “failure to replicate” really mean? Am Psychol 70(6):487CrossRef
Zurück zum Zitat McNeish DM, Stapleton LM (2016) The effect of small sample size on two-level model estimates: a review and illustration. Educ Psychol Rev 28(2):295–314CrossRef McNeish DM, Stapleton LM (2016) The effect of small sample size on two-level model estimates: a review and illustration. Educ Psychol Rev 28(2):295–314CrossRef
Zurück zum Zitat Miller J (2005) Replicating software engineering experiments: a poisoned chalice or the holy grail. Inf Softw Technol 47(4):233–244CrossRef Miller J (2005) Replicating software engineering experiments: a poisoned chalice or the holy grail. Inf Softw Technol 47(4):233–244CrossRef
Zurück zum Zitat Morris SB, DeShon RP (2002) Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychol Methods 7(1):105CrossRef Morris SB, DeShon RP (2002) Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychol Methods 7(1):105CrossRef
Zurück zum Zitat Morris TP, White IR, Michael JC (2019) Using simulation studies to evaluate statistical methods. Stat Med 38(11):2074–2102MathSciNetCrossRef Morris TP, White IR, Michael JC (2019) Using simulation studies to evaluate statistical methods. Stat Med 38(11):2074–2102MathSciNetCrossRef
Zurück zum Zitat Murphy GC (2019) Beyond integrated development environments: adding context to software development. In: Proceedings of the 41st international conference on software engineering: new ideas and emerging results (ICSE-NIER), IEEE, pp 73–76 Murphy GC (2019) Beyond integrated development environments: adding context to software development. In: Proceedings of the 41st international conference on software engineering: new ideas and emerging results (ICSE-NIER), IEEE, pp 73–76
Zurück zum Zitat Myers GJ, Sandler C, Badgett T (2011) The art of software testing. Wiley, New York Myers GJ, Sandler C, Badgett T (2011) The art of software testing. Wiley, New York
Zurück zum Zitat Ntafos S (1998) On random and partition testing. In: Proceedings of ACM SIGSOFT international symposium on software testing and analysis. ACM Press, pp 42–48 Ntafos S (1998) On random and partition testing. In: Proceedings of ACM SIGSOFT international symposium on software testing and analysis. ACM Press, pp 42–48
Zurück zum Zitat Pashler H, Wagenmakers EJ (2012) Editors’ introduction to the special section on replicability in psychological science: a crisis of confidence? Perspect Psychol Sci 7(6):528–530CrossRef Pashler H, Wagenmakers EJ (2012) Editors’ introduction to the special section on replicability in psychological science: a crisis of confidence? Perspect Psychol Sci 7(6):528–530CrossRef
Zurück zum Zitat Patil P, Peng RD, Leek JT (2016) What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect Psychol Sci 11(4):539–544CrossRef Patil P, Peng RD, Leek JT (2016) What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect Psychol Sci 11(4):539–544CrossRef
Zurück zum Zitat Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: Proceedings of the third international symposium on empirical software engineering and measurement, ESEM 2009, October 15–16, 2009, Lake Buena Vista, Florida, USA. IEEE Computer Society, pp 401–404 Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: Proceedings of the third international symposium on empirical software engineering and measurement, ESEM 2009, October 15–16, 2009, Lake Buena Vista, Florida, USA. IEEE Computer Society, pp 401–404
Zurück zum Zitat Petitti DB (2000) Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. 31 OUP USA Petitti DB (2000) Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. 31 OUP USA
Zurück zum Zitat Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131CrossRef Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131CrossRef
Zurück zum Zitat Ruvuna F (2004) Unequal center sizes, sample size, and power in multicenter clinical trials. Drug Inf J 38(4):387–394CrossRef Ruvuna F (2004) Unequal center sizes, sample size, and power in multicenter clinical trials. Drug Inf J 38(4):387–394CrossRef
Zurück zum Zitat Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage Learning Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage Learning
Zurück zum Zitat Shepperd M, Ajienka N, Counsell S (2018) The role and value of replication in empirical software engineering results. Inf Softw Technol 99:120–132CrossRef Shepperd M, Ajienka N, Counsell S (2018) The role and value of replication in empirical software engineering results. Inf Softw Technol 99:120–132CrossRef
Zurück zum Zitat Shull F, Mendoncça M G, Basili V, Carver J, Maldonado JC, Fabbri S, Travassos GH, Ferreira MC (2004) Knowledge-sharing issues in experimental software engineering. Empir Softw Eng 9(1–2):111–137CrossRef Shull F, Mendoncça M G, Basili V, Carver J, Maldonado JC, Fabbri S, Travassos GH, Ferreira MC (2004) Knowledge-sharing issues in experimental software engineering. Empir Softw Eng 9(1–2):111–137CrossRef
Zurück zum Zitat Simmonds MC, Higginsa JP, Stewartb LA, Tierneyb JF, Clarke MJ, Thompson SG (2005) Meta-analysis of individual patient data from randomized trials: a review of methods used in practice. Clin Trials 2(3):209–217CrossRef Simmonds MC, Higginsa JP, Stewartb LA, Tierneyb JF, Clarke MJ, Thompson SG (2005) Meta-analysis of individual patient data from randomized trials: a review of methods used in practice. Clin Trials 2(3):209–217CrossRef
Zurück zum Zitat Snijders TA (2011) Multilevel analysis. In: International encyclopedia of statistical science. Springer, pp 879–882 Snijders TA (2011) Multilevel analysis. In: International encyclopedia of statistical science. Springer, pp 879–882
Zurück zum Zitat Thompson B (1994) The pivotal role of replication in psychological research: empirically evaluating the replicability of sample results. J Pers 62 (2):157–176CrossRef Thompson B (1994) The pivotal role of replication in psychological research: empirically evaluating the replicability of sample results. J Pers 62 (2):157–176CrossRef
Zurück zum Zitat Tosun A, Dieste O, Fucci D, Vegas S, Turhan B, Erdogmus H, Santos A, Oivo M, Toro K, Jarvinen J et al (2017) An industry experiment on the effects of test-driven development on external quality and productivity. Empir Softw Eng 22(6):2763–2805CrossRef Tosun A, Dieste O, Fucci D, Vegas S, Turhan B, Erdogmus H, Santos A, Oivo M, Toro K, Jarvinen J et al (2017) An industry experiment on the effects of test-driven development on external quality and productivity. Empir Softw Eng 22(6):2763–2805CrossRef
Zurück zum Zitat Viechtbauer W (2010) Metafor: meta-analysis package for r. R package version 2010, 1–0 Viechtbauer W (2010) Metafor: meta-analysis package for r. R package version 2010, 1–0
Zurück zum Zitat Whitehead A (2002) Meta-analysis of controlled clinical trials, vol 7. Wiley, New YorkCrossRef Whitehead A (2002) Meta-analysis of controlled clinical trials, vol 7. Wiley, New YorkCrossRef
Zurück zum Zitat Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media
Metadaten
Titel
Comparing the results of replications in software engineering
verfasst von
Adrian Santos
Sira Vegas
Markku Oivo
Natalia Juristo
Publikationsdatum
01.03.2021
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2021
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-020-09907-7

Weitere Artikel der Ausgabe 2/2021

Empirical Software Engineering 2/2021 Zur Ausgabe

Premium Partner