Skip to main content
Top

2013 | OriginalPaper | Chapter

Challenges of Evaluating the Quality of Software Engineering Experiments

Authors : Oscar Dieste, Natalia Juristo

Published in: Perspectives on the Future of Software Engineering

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Good-quality experiments are free of bias. Bias is considered to be related to internal validity (e.g., how well experiments are planned, designed, executed, and analysed). Quality scales and expert opinion are two approaches for assessing the quality of experiments. Aim: Identify whether there is a relationship between bias and quality scale and expert opinion predictions in SE experiments. Method: We used a quality scale to determine the quality of 35 experiments from three systematic literature reviews. We used two different procedures (effect size and response ratio) to calculate the bias in diverse response variables for the above experiments. Experienced researchers assessed the quality of these experiments. We analysed the correlations between the quality scores, bias and expert opinion. Results: The relationship between quality scales, expert opinion and bias depends on the technology exercised in the experiments. The correlation between quality scales, expert opinion and bias is only correct when the technologies can be subjected to acceptable experimental control. Both correct and incorrect expert ratings are more extreme than the quality scales. Conclusions: A quality scale based on formal internal quality criteria will predict bias satisfactorily provided that the technology can be properly controlled in the laboratory.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The differences in magnitude between the two coefficients can be attributed to the metrics used (effect size and response ratio); although conceptually interesting, they are less relevant than signs to this discussion.
 
2
We use the model stated in Eq. 1 in all cases.
 
3
The coefficients and statistical significance of the EXPERT-SCORE/bias correlation increase when the sample size N is added to the model and are statistically significant in all cases. This would suggest that experienced researchers do not consider sample size as a quality criterion. We did not include the reference to N in the main body of the text so as not to further complicate the discussion.
 
4
Inspection techniques are positioned somewhere in-between pair programming and elicitation techniques with respect to controllability.
 
Literature
1.
go back to reference Kitchenham, B., Charters, S.: Guidelines for Performing Systematic Literature Reviews in Software Engineering. Version 2.3. EBSE Technical Report, EBSE-2007-01 (2007) Kitchenham, B., Charters, S.: Guidelines for Performing Systematic Literature Reviews in Software Engineering. Version 2.3. EBSE Technical Report, EBSE-2007-01 (2007)
2.
go back to reference CRD, University of York: Systematic Reviews: CRD’s Guidance for Undertaking Reviews in Health Care. CRD, University of York, York (2009) CRD, University of York: Systematic Reviews: CRD’s Guidance for Undertaking Reviews in Health Care. CRD, University of York, York (2009)
3.
go back to reference Biolchini, J., Mian, P., Natali, A., et al.: Systematic Review in Software Engineering. Technical Report ES 679/05, COPPE/UFRJ (2005) Biolchini, J., Mian, P., Natali, A., et al.: Systematic Review in Software Engineering. Technical Report ES 679/05, COPPE/UFRJ (2005)
4.
go back to reference Dybå, T., Dingsøyr, T.: Strength of evidence in systematic reviews in software engineering. In: 2nd International Symposium on Empirical Software Engineering and Measurement (ESEM’08), pp. 178–187. (2008) Dybå, T., Dingsøyr, T.: Strength of evidence in systematic reviews in software engineering. In: 2nd International Symposium on Empirical Software Engineering and Measurement (ESEM’08), pp. 178–187. (2008)
5.
go back to reference Dybå, T., Dingsøyr, T.: Empirical studies of agile software development: a systematic review. Inf. Softw. Technol. 50, 833–859 (2008)CrossRef Dybå, T., Dingsøyr, T.: Empirical studies of agile software development: a systematic review. Inf. Softw. Technol. 50, 833–859 (2008)CrossRef
6.
go back to reference Afzal, W., Torkar, R., Feldt, R.: A systematic review of search-based testing for non-functional system properties. Inf. Softw. Technol. 51, 957–976 (2009)CrossRef Afzal, W., Torkar, R., Feldt, R.: A systematic review of search-based testing for non-functional system properties. Inf. Softw. Technol. 51, 957–976 (2009)CrossRef
7.
go back to reference Balk, E.M., Bonis, P.L., Moskowitz, H., et al.: Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 287, 2973–2982 (2002)CrossRef Balk, E.M., Bonis, P.L., Moskowitz, H., et al.: Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 287, 2973–2982 (2002)CrossRef
8.
go back to reference Deeks, J. J., Dinnes, J., D’Amico, R., et al.: Evaluating non-randomised intervention studies. Health technology assessment (Winchester, England) JID – 9706284, (1030) Deeks, J. J., Dinnes, J., D’Amico, R., et al.: Evaluating non-randomised intervention studies. Health technology assessment (Winchester, England) JID – 9706284, (1030)
9.
go back to reference Emerson, J.D., Burdick, E., Hoaglin, D.C., et al.: An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials. Control. Clin. Trials 11, 339–352 (1990)CrossRef Emerson, J.D., Burdick, E., Hoaglin, D.C., et al.: An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials. Control. Clin. Trials 11, 339–352 (1990)CrossRef
10.
go back to reference McKee, M., Britton, A., Black, N., et al.: Interpreting the evidence: choosing between randomised and non-randomised studies. BMJ 319, 312–315 (1999)CrossRef McKee, M., Britton, A., Black, N., et al.: Interpreting the evidence: choosing between randomised and non-randomised studies. BMJ 319, 312–315 (1999)CrossRef
11.
go back to reference Dieste, O.: Quantitative determination of the relationship between internal validity and bias in software engineering experiments: consequences for systematic literature reviews. In: 5th International Symposium on Empirical Software Engineering and Measurement (ESEM’11), pp. 285–294. (2011) Dieste, O.: Quantitative determination of the relationship between internal validity and bias in software engineering experiments: consequences for systematic literature reviews. In: 5th International Symposium on Empirical Software Engineering and Measurement (ESEM’11), pp. 285–294. (2011)
12.
go back to reference Kitchenham, B.A., Sjøberg, D.I.K., Dybå, T., et al.: Three empirical studies on the agreement of reviewers about the quality of software engineering experiments. Inf. Softw. Technol. 54, 804–819 (2012)CrossRef Kitchenham, B.A., Sjøberg, D.I.K., Dybå, T., et al.: Three empirical studies on the agreement of reviewers about the quality of software engineering experiments. Inf. Softw. Technol. 54, 804–819 (2012)CrossRef
13.
go back to reference Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin Company, Boston (2001) Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin Company, Boston (2001)
14.
go back to reference Montgomery, D.C., Runger, G.C.: Applied Statistics and Probability for Engineers. Wiley, Hoboken (2010) Montgomery, D.C., Runger, G.C.: Applied Statistics and Probability for Engineers. Wiley, Hoboken (2010)
15.
go back to reference Kitchenham, B. A.: Procedures for Performing Systematic Reviews. Keele University TR/SE-0401 (2004) Kitchenham, B. A.: Procedures for Performing Systematic Reviews. Keele University TR/SE-0401 (2004)
16.
go back to reference Jüni, P., Witschi, A., Bloch, R., et al.: The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282, 1054–1060 (1999)CrossRef Jüni, P., Witschi, A., Bloch, R., et al.: The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282, 1054–1060 (1999)CrossRef
17.
go back to reference Schulz, K.F., Chalmers, I., Hayes, R.J., et al.: Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273, 408–412 (1995)CrossRef Schulz, K.F., Chalmers, I., Hayes, R.J., et al.: Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273, 408–412 (1995)CrossRef
18.
go back to reference Higgins J., Green S.: Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0. The Cochrane Collaboration (2011) Higgins J., Green S.: Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0. The Cochrane Collaboration (2011)
19.
go back to reference Petticrew, M., Roberts, H.: Systematic Reviews in the Social Sciences: A Practical Guide. Wiley-Blackwell, Oxford (2005) Petticrew, M., Roberts, H.: Systematic Reviews in the Social Sciences: A Practical Guide. Wiley-Blackwell, Oxford (2005)
20.
go back to reference Downs, S.H., Black, N.: The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J. Epidemiol. Commun. Health JID – 7909766, (1028) Downs, S.H., Black, N.: The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J. Epidemiol. Commun. Health JID – 7909766, (1028)
21.
go back to reference Jadad, A.R., Moore, R.A., Carroll, D., et al.: Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control. Clin. Trials 17, 1–12 (1996)CrossRef Jadad, A.R., Moore, R.A., Carroll, D., et al.: Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control. Clin. Trials 17, 1–12 (1996)CrossRef
22.
go back to reference Owens, D.K., Lohr, K.N., Atkins, D., et al.: AHRQ series paper 5: grading the strength of a body of evidence when comparing medical interventions – Agency for Healthcare Research and Quality and the Effective Health-Care Program. J. Clin. Epidemiol. 63, 513–523 (2010)CrossRef Owens, D.K., Lohr, K.N., Atkins, D., et al.: AHRQ series paper 5: grading the strength of a body of evidence when comparing medical interventions – Agency for Healthcare Research and Quality and the Effective Health-Care Program. J. Clin. Epidemiol. 63, 513–523 (2010)CrossRef
23.
go back to reference Cook, T.D., Campbell, D.T.: Quasi-Experimentation: Design & Analysis Issues for Field Settings. Rand McNally College Pub. Co., Chicago (1979) Cook, T.D., Campbell, D.T.: Quasi-Experimentation: Design & Analysis Issues for Field Settings. Rand McNally College Pub. Co., Chicago (1979)
24.
go back to reference Ciolkowski, M.: What do we know about perspective-based reading? An approach for quantitative aggregation in software engineering. In: 3rd International Symposium on Empirical Software Engineering and Measurement (ESEM’09), pp. 133−144. (2009) Ciolkowski, M.: What do we know about perspective-based reading? An approach for quantitative aggregation in software engineering. In: 3rd International Symposium on Empirical Software Engineering and Measurement (ESEM’09), pp. 133−144. (2009)
25.
go back to reference Dieste, O., Juristo, N.: Systematic review and aggregation of empirical studies on elicitation techniques. IEEE Trans. Softw. Eng. 37, 304 (2011)CrossRef Dieste, O., Juristo, N.: Systematic review and aggregation of empirical studies on elicitation techniques. IEEE Trans. Softw. Eng. 37, 304 (2011)CrossRef
26.
go back to reference Hannay, J.E., Dybå, T., Arisholm, E., et al.: The effectiveness of pair programming: a meta-analysis. Inf. Softw. Technol. 51, 1110–1122 (2009)CrossRef Hannay, J.E., Dybå, T., Arisholm, E., et al.: The effectiveness of pair programming: a meta-analysis. Inf. Softw. Technol. 51, 1110–1122 (2009)CrossRef
27.
go back to reference Griman, A.C.: Process for the systematic review of experiments in software engineering, Ph.D. thesis, Universidad Politécnica de Madrid, under review process (2013) Griman, A.C.: Process for the systematic review of experiments in software engineering, Ph.D. thesis, Universidad Politécnica de Madrid, under review process (2013)
28.
go back to reference Hedges, L.V., Olkin, I.: Statistical Methods for Meta-Analysis. Academic, Orlando (1985)MATH Hedges, L.V., Olkin, I.: Statistical Methods for Meta-Analysis. Academic, Orlando (1985)MATH
29.
go back to reference Worm, B., Barbier, E.B., Beaumont, N., et al.: Impacts of biodiversity loss on ocean ecosystem services: supplementary online material. Science 314, 787–790 (2006)CrossRef Worm, B., Barbier, E.B., Beaumont, N., et al.: Impacts of biodiversity loss on ocean ecosystem services: supplementary online material. Science 314, 787–790 (2006)CrossRef
30.
go back to reference Furr, R.M., Bacharach, V.R.: Psychometrics: An Introduction. SAGE, Thousand Oaks (2007) Furr, R.M., Bacharach, V.R.: Psychometrics: An Introduction. SAGE, Thousand Oaks (2007)
32.
go back to reference Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)MATHCrossRef Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)MATHCrossRef
33.
go back to reference Carifio, J., Perla, R.J.: Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. J. Soc. Sci. 3, 106–116 (2007) Carifio, J., Perla, R.J.: Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. J. Soc. Sci. 3, 106–116 (2007)
35.
go back to reference Maiden, N.A.M., Rugg, G.: ACRE: selecting methods for requirements acquisition. Softw. Eng. J. 11, 183–192 (1996)CrossRef Maiden, N.A.M., Rugg, G.: ACRE: selecting methods for requirements acquisition. Softw. Eng. J. 11, 183–192 (1996)CrossRef
36.
go back to reference Aranda, A., Dieste, O., Juristo, N.: Searching for the variables that influence requirements elicitation. Requir. Eng. J. (submitted 2013) Aranda, A., Dieste, O., Juristo, N.: Searching for the variables that influence requirements elicitation. Requir. Eng. J. (submitted 2013)
Metadata
Title
Challenges of Evaluating the Quality of Software Engineering Experiments
Authors
Oscar Dieste
Natalia Juristo
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-37395-4_11

Premium Partner