Skip to main content
Top
Published in: Empirical Software Engineering 1/2015

01-02-2015

A practical guide to controlled experiments of software engineering tools with human participants

Authors: Amy J. Ko, Thomas D. LaToza, Margaret M. Burnett

Published in: Empirical Software Engineering | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Empirical studies, often in the form of controlled experiments, have been widely adopted in software engineering research as a way to evaluate the merits of new software engineering tools. However, controlled experiments involving human participants actually using new tools are still rare, and when they are conducted, some have serious validity concerns. Recent research has also shown that many software engineering researchers view this form of tool evaluation as too risky and too difficult to conduct, as they might ultimately lead to inconclusive or negative results. In this paper, we aim both to help researchers minimize the risks of this form of tool evaluation, and to increase their quality, by offering practical methodological guidance on designing and running controlled experiments with developers. Our guidance fills gaps in the empirical literature by explaining, from a practical perspective, options in the recruitment and selection of human participants, informed consent, experimental procedures, demographic measurements, group assignment, training, the selecting and design of tasks, the measurement of common outcome variables such as success and time on task, and study debriefing. Throughout, we situate this guidance in the results of a new systematic review of the tool evaluations that were published in over 1,700 software engineering papers published from 2001 to 2011.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Strictly speaking, an experiment is by definition quantitative [Basili 2007]. Other kinds of empirical studies are not technically experiments. Thus, in this paper, when we refer to experiments, we mean quantitative experiments.
 
2
Studies using this last method were often referred to by authors as “case studies,” but this usage conflicts with the notion of case studies as empirical investigations of some phenomenon within a real-life context (Yin 2003), as the tool use experience reports in these papers were not performed in real life contexts. “Case study” was also used to refer to evaluations without human use.
 
Literature
go back to reference Anderson JR, Reiser BJ (1985) The LISP tutor. Byte 10:159–175 Anderson JR, Reiser BJ (1985) The LISP tutor. Byte 10:159–175
go back to reference Atkins DL, Ball T, Graves TL, Mockus A (2002) Using version control data to evaluate the impact of software tools: A case study of the version editor. IEEE Trans Softw Eng 28(7):625–637CrossRef Atkins DL, Ball T, Graves TL, Mockus A (2002) Using version control data to evaluate the impact of software tools: A case study of the version editor. IEEE Trans Softw Eng 28(7):625–637CrossRef
go back to reference Bangor A, Kortum PT, Miller JT (2008) An empirical evaluation of the system usability scale. Int J Human-Comput Interact 24(6):574–594CrossRef Bangor A, Kortum PT, Miller JT (2008) An empirical evaluation of the system usability scale. Int J Human-Comput Interact 24(6):574–594CrossRef
go back to reference Basili VR (1993) The experimental paradigm in software engineering. Int Work Exp Eng Issues: Crit Assess Futur Dir 706:3–12MATH Basili VR (1993) The experimental paradigm in software engineering. Int Work Exp Eng Issues: Crit Assess Futur Dir 706:3–12MATH
go back to reference Basili VR (1996) The role of experimentation in software engineering: Past, current, and future. International Conference on Software Engineering, 442–449 Basili VR (1996) The role of experimentation in software engineering: Past, current, and future. International Conference on Software Engineering, 442–449
go back to reference Basili VR (2007) The role of controlled experiments in software engineering research. Empirical Software Engineering Issues, LNCS 4336, Basili V et al. (Eds.), Springer-Verlag, 33–37 Basili VR (2007) The role of controlled experiments in software engineering research. Empirical Software Engineering Issues, LNCS 4336, Basili V et al. (Eds.), Springer-Verlag, 33–37
go back to reference Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng, 733–743, July Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng, 733–743, July
go back to reference Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. In Encyclopedia of Software Engineering, John Wiley and Sons, 528–532 Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. In Encyclopedia of Software Engineering, John Wiley and Sons, 528–532
go back to reference Beringer P (2004) Using students as subjects in requirements prioritization. International Symposium on Empirical Software Engineering, 167–176 Beringer P (2004) Using students as subjects in requirements prioritization. International Symposium on Empirical Software Engineering, 167–176
go back to reference Boehm BW, Papaccio PN (1988) Understanding and controlling software costs. IEEE Trans Softw Eng SE-14(10):1462–1477CrossRef Boehm BW, Papaccio PN (1988) Understanding and controlling software costs. IEEE Trans Softw Eng SE-14(10):1462–1477CrossRef
go back to reference Breaugh JA (2003) Effect size estimation: factors to consider and mistakes to avoid. J Manag 29(1):79–97 Breaugh JA (2003) Effect size estimation: factors to consider and mistakes to avoid. J Manag 29(1):79–97
go back to reference Bruun A, Gull P, Hofmeister L, Stage J (2009) Let your users do the testing: a comparison of three remote asynchronous usability testing methods. ACM Conference on Human Factors in Computing Systems, 1619–1628 Bruun A, Gull P, Hofmeister L, Stage J (2009) Let your users do the testing: a comparison of three remote asynchronous usability testing methods. ACM Conference on Human Factors in Computing Systems, 1619–1628
go back to reference Buse RPL, Sadowski C, Weimer W (2011) Benefits and barriers of user evaluation in software engineering research. ACM Conference on Systems, Programming, Languages and Applications Buse RPL, Sadowski C, Weimer W (2011) Benefits and barriers of user evaluation in software engineering research. ACM Conference on Systems, Programming, Languages and Applications
go back to reference Carver J, Jaccheri L, Morasca S, Shull F (2003). Issues in using students in empirical studies in software engineering education. Software Metrics Symposium, 239–249 Carver J, Jaccheri L, Morasca S, Shull F (2003). Issues in using students in empirical studies in software engineering education. Software Metrics Symposium, 239–249
go back to reference Chuttur MY (2009). Overview of the technology acceptance model: Origins, developments and future directions. Indiana University, USA, Sprouts: Working Papers on Information Systems Chuttur MY (2009). Overview of the technology acceptance model: Origins, developments and future directions. Indiana University, USA, Sprouts: Working Papers on Information Systems
go back to reference Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 13(3):319CrossRef Davis FD (1989) Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 13(3):319CrossRef
go back to reference Dell N, Vaidyanathan V, Medhi I, Cutrell E, Thies W (2012) “Yours is better!” Participant response bias in HCI. ACM Conference on Human Factors in Computing Systems, 1321–1330 Dell N, Vaidyanathan V, Medhi I, Cutrell E, Thies W (2012) “Yours is better!” Participant response bias in HCI. ACM Conference on Human Factors in Computing Systems, 1321–1330
go back to reference Dieste O, Grim´n A, Juristo N, Saxena H (2011) Quantitative determination of the relationship between internal validity and bias in software engineering experiments: Consequences for systematic literature reviews. International Symposium on Empirical Software Engineering and Measurement, 285–294 Dieste O, Grim´n A, Juristo N, Saxena H (2011) Quantitative determination of the relationship between internal validity and bias in software engineering experiments: Consequences for systematic literature reviews. International Symposium on Empirical Software Engineering and Measurement, 285–294
go back to reference Dig D, Manzoor K, Johnson R, Nguyen TN (2008) Effective software merging in the presence of object-oriented refactorings. IEEE Trans Softw Eng 34(3):321–335CrossRef Dig D, Manzoor K, Johnson R, Nguyen TN (2008) Effective software merging in the presence of object-oriented refactorings. IEEE Trans Softw Eng 34(3):321–335CrossRef
go back to reference Dybå T, Kampenes V, Sjøberg D (2006) A systematic review of statistical power in software engineering experiments. Inf Softw Technol 48(8):745–755CrossRef Dybå T, Kampenes V, Sjøberg D (2006) A systematic review of statistical power in software engineering experiments. Inf Softw Technol 48(8):745–755CrossRef
go back to reference Dybå T, Prikladnicki R, Rönkkö K, Seaman C, Sillito J (2011) Qualitative research in software engineering. Empir Softw Eng 16(4):425–429CrossRef Dybå T, Prikladnicki R, Rönkkö K, Seaman C, Sillito J (2011) Qualitative research in software engineering. Empir Softw Eng 16(4):425–429CrossRef
go back to reference Easterbrook S, Singer J, Storey M, Damian D (2008) Selecting empirical methods for software engineering research, in Guide to Advanced Empirical Software Engineering, Springer, 285–311 Easterbrook S, Singer J, Storey M, Damian D (2008) Selecting empirical methods for software engineering research, in Guide to Advanced Empirical Software Engineering, Springer, 285–311
go back to reference Feigenspan J, Kastner C, Liebig J, Apel S, Hanenberg S (2012). Measuring programming experience. International Conference on Program Comprehension, 73–82 Feigenspan J, Kastner C, Liebig J, Apel S, Hanenberg S (2012). Measuring programming experience. International Conference on Program Comprehension, 73–82
go back to reference Fenton N (1993) How effective are software engineering methods? J Syst Softw 22(2):141–146CrossRef Fenton N (1993) How effective are software engineering methods? J Syst Softw 22(2):141–146CrossRef
go back to reference Flyvbjerg B (2006) Five misunderstandings about case study research. Qual Inq 12(2):219–245CrossRef Flyvbjerg B (2006) Five misunderstandings about case study research. Qual Inq 12(2):219–245CrossRef
go back to reference Glass RL, Vessey I, Ramesh V (2002) Research in software engineering: an analysis of the literature. Inf Softw Technol 44(8):491–506CrossRef Glass RL, Vessey I, Ramesh V (2002) Research in software engineering: an analysis of the literature. Inf Softw Technol 44(8):491–506CrossRef
go back to reference Golden E, John BE, Bass L (2005) The value of a usability-supporting architectural pattern in software architecture design: a controlled experiment. ACM/IEEE International Conference on Software Engineering Golden E, John BE, Bass L (2005) The value of a usability-supporting architectural pattern in software architecture design: a controlled experiment. ACM/IEEE International Conference on Software Engineering
go back to reference Greenberg S, Buxton B (2008) Usability evaluation considered harmful (some of the time). ACM Conference on Human Factors in Computing Systems, 111–120 Greenberg S, Buxton B (2008) Usability evaluation considered harmful (some of the time). ACM Conference on Human Factors in Computing Systems, 111–120
go back to reference Gwet KL (2010) Handbook of inter-rater reliability, 2nd edn. Advanced Analytics, Gaithersburg Gwet KL (2010) Handbook of inter-rater reliability, 2nd edn. Advanced Analytics, Gaithersburg
go back to reference Hanenberg S (2010) An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time. ACM International Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA), 22–35 Hanenberg S (2010) An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time. ACM International Conference on Object-Oriented Programming Systems Languages and Applications (OOPSLA), 22–35
go back to reference Hannay JE, Sjøberg DIK, Dyba T (2007) A systematic review of theory use in software engineering experiments. IEEE Trans Softw Eng 33(2):87–107CrossRef Hannay JE, Sjøberg DIK, Dyba T (2007) A systematic review of theory use in software engineering experiments. IEEE Trans Softw Eng 33(2):87–107CrossRef
go back to reference Holmes R, Walker RJ (2013) Systematizing pragmatic software reuse. ACM Trans Softw Eng Methodol 21(4), Article 20: 44 pages Holmes R, Walker RJ (2013) Systematizing pragmatic software reuse. ACM Trans Softw Eng Methodol 21(4), Article 20: 44 pages
go back to reference John B, Packer H (1995) Learning and using the cognitive walkthrough method: a case study approach. ACM Conference on Human Factors in Computing Systems, 429–436 John B, Packer H (1995) Learning and using the cognitive walkthrough method: a case study approach. ACM Conference on Human Factors in Computing Systems, 429–436
go back to reference Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Springer Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Springer
go back to reference Kampenes V, Dybå T, Hannay J, Sjøberg D (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086CrossRef Kampenes V, Dybå T, Hannay J, Sjøberg D (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086CrossRef
go back to reference Kaptein M, Robertson J (2012) Rethinking statistical analysis methods for CHI. ACM Conference on Human Factors in Computing Systems, 1105–1114 Kaptein M, Robertson J (2012) Rethinking statistical analysis methods for CHI. ACM Conference on Human Factors in Computing Systems, 1105–1114
go back to reference Kelleher C, Pausch R (2005). Stencils-based tutorials: design and evaluation. ACM Conference on Human Factors in Computing Systems, 541–550 Kelleher C, Pausch R (2005). Stencils-based tutorials: design and evaluation. ACM Conference on Human Factors in Computing Systems, 541–550
go back to reference Keppel G (1982) Design and analysis: a researcher’s handbook, 2nd edn. Prentice-Hall, Englewood Cliffs Keppel G (1982) Design and analysis: a researcher’s handbook, 2nd edn. Prentice-Hall, Englewood Cliffs
go back to reference Kersten M, Murphy G (2006) Using task context to improve programmer productivity. ACM Symposium on Foundations of Software Engineering, 1–11 Kersten M, Murphy G (2006) Using task context to improve programmer productivity. ACM Symposium on Foundations of Software Engineering, 1–11
go back to reference Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, Emam, K.E., Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28(8):721–734CrossRef Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, Emam, K.E., Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28(8):721–734CrossRef
go back to reference Kitchenham BA, Brereton P, Turner M, Niazi MK, Linkman S, Pretorius R, Budgen D (2010) Refining the systematic literature review process—two participant-observer case studies. Empir Softw Eng 15(6):618–653CrossRef Kitchenham BA, Brereton P, Turner M, Niazi MK, Linkman S, Pretorius R, Budgen D (2010) Refining the systematic literature review process—two participant-observer case studies. Empir Softw Eng 15(6):618–653CrossRef
go back to reference Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with Mechanical Turk. ACM Conference on Human Factors in Computing Systems, 453–456 Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with Mechanical Turk. ACM Conference on Human Factors in Computing Systems, 453–456
go back to reference Ko AJ, Myers BA (2009) Finding causes of program output with the Java Whyline. ACM Conference on Human Factors in Computing Systems, 1569–1578 Ko AJ, Myers BA (2009) Finding causes of program output with the Java Whyline. ACM Conference on Human Factors in Computing Systems, 1569–1578
go back to reference Ko AJ, Wobbrock JO (2010) Cleanroom: edit-time error detection with the uniqueness heuristic. IEEE Symposium on Visual Languages and Human-Centric Computing, 7–14 Ko AJ, Wobbrock JO (2010) Cleanroom: edit-time error detection with the uniqueness heuristic. IEEE Symposium on Visual Languages and Human-Centric Computing, 7–14
go back to reference Ko AJ, Burnett MM, Green TRG, Rothermel KJ, Cook CR (2002) Using the Cognitive Walkthrough to improve the design of a visual programming experiment. J Vis Lang Comput 13:517–544CrossRef Ko AJ, Burnett MM, Green TRG, Rothermel KJ, Cook CR (2002) Using the Cognitive Walkthrough to improve the design of a visual programming experiment. J Vis Lang Comput 13:517–544CrossRef
go back to reference Ko AJ, DeLine R, Venolia G (2007) Information needs in collocated software development teams. International Conference on Software Engineering (ICSE) Ko AJ, DeLine R, Venolia G (2007) Information needs in collocated software development teams. International Conference on Software Engineering (ICSE)
go back to reference LaToza TD, Myers BA (2010) Developers ask reachability questions. International Conference on Software Engineering (ICSE), 185–194 LaToza TD, Myers BA (2010) Developers ask reachability questions. International Conference on Software Engineering (ICSE), 185–194
go back to reference LaToza, TD, Myers BA (2011) Visualizing call graphs. IEEE Visual Languages and Human-Centric Computing (VL/HCC), Pittsburgh, PA LaToza, TD, Myers BA (2011) Visualizing call graphs. IEEE Visual Languages and Human-Centric Computing (VL/HCC), Pittsburgh, PA
go back to reference LaToza TD, Myers BA (2011) Designing useful tools for developers. ACM SIGPLAN Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU), 45–50 LaToza TD, Myers BA (2011) Designing useful tools for developers. ACM SIGPLAN Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU), 45–50
go back to reference Lazar J, Feng JH, Hochheiser H (2010) Research methods in human-computer interaction. Wiley Lazar J, Feng JH, Hochheiser H (2010) Research methods in human-computer interaction. Wiley
go back to reference Lott C, Rombach D (1996) Repeatable software engineering experiments for comparing defect-detection techniques. Empir Softw Eng 1:241–277CrossRef Lott C, Rombach D (1996) Repeatable software engineering experiments for comparing defect-detection techniques. Empir Softw Eng 1:241–277CrossRef
go back to reference Martin DW (1996) Doing psychology experiments, 4th edn. Brooks/Cole, Pacific Grove Martin DW (1996) Doing psychology experiments, 4th edn. Brooks/Cole, Pacific Grove
go back to reference McDowall D, McCleary R, Meidinger E, Hay RA (1980) Interrupted Time Series Analysis, 1st Edition. SAGE Publications McDowall D, McCleary R, Meidinger E, Hay RA (1980) Interrupted Time Series Analysis, 1st Edition. SAGE Publications
go back to reference Murphy GC, Walker RJ, Baniassad ELA (1999) Evaluating emerging software development technologies: lessons learned from assessing aspect-oriented programming. IEEE Trans Softw Eng 25(4):438–455CrossRef Murphy GC, Walker RJ, Baniassad ELA (1999) Evaluating emerging software development technologies: lessons learned from assessing aspect-oriented programming. IEEE Trans Softw Eng 25(4):438–455CrossRef
go back to reference Murphy-Hill E, Murphy GC, Griswold WG (2010). Understanding context: creating a lasting impact in experimental software engineering research. FSE/SDP Workshop on Future of Software Engineering Research Murphy-Hill E, Murphy GC, Griswold WG (2010). Understanding context: creating a lasting impact in experimental software engineering research. FSE/SDP Workshop on Future of Software Engineering Research
go back to reference Newell A (1973) You can’t play 20 questions with nature and win: projective comments on the papers of this symposium. In: Chase WG (ed) Visual information processing. Academic, New York Newell A (1973) You can’t play 20 questions with nature and win: projective comments on the papers of this symposium. In: Chase WG (ed) Visual information processing. Academic, New York
go back to reference Nickerson RS (1998) Confirmation bias: a ubiquitous phenomenon in many guises. Rev Gen Psychol 2(2):175–220CrossRef Nickerson RS (1998) Confirmation bias: a ubiquitous phenomenon in many guises. Rev Gen Psychol 2(2):175–220CrossRef
go back to reference Nimmer JW, Ernst MD (2002) Invariant inference for static checking: an empirical evaluation. SIGSOFT Softw Eng Notes 27(6):11–20CrossRef Nimmer JW, Ernst MD (2002) Invariant inference for static checking: an empirical evaluation. SIGSOFT Softw Eng Notes 27(6):11–20CrossRef
go back to reference Olsen DR (2007) Evaluating user interface systems research. ACM Symposium on User Interface Software and Technology, 251–258 Olsen DR (2007) Evaluating user interface systems research. ACM Symposium on User Interface Software and Technology, 251–258
go back to reference Polson P, Lewis C, Rieman J, Wharton C (1992) Cognitive walkthroughs: a method for theory-based evaluation of user interfaces. Int J Human-Comput Interact 36:741–773 Polson P, Lewis C, Rieman J, Wharton C (1992) Cognitive walkthroughs: a method for theory-based evaluation of user interfaces. Int J Human-Comput Interact 36:741–773
go back to reference Ramesh V, Glass RL, Vessey I (2004) Research in computer science: an empirical study. J Syst Softw 70(1–2):165–176CrossRef Ramesh V, Glass RL, Vessey I (2004) Research in computer science: an empirical study. J Syst Softw 70(1–2):165–176CrossRef
go back to reference Rombach HD, Basili VR, Selby RW (1992) Experimental software engineering issues: critical assessment and future directions. International Workshop Dagstuhl Castle (Germany), Sept. 14–18 Rombach HD, Basili VR, Selby RW (1992) Experimental software engineering issues: critical assessment and future directions. International Workshop Dagstuhl Castle (Germany), Sept. 14–18
go back to reference Rosenthal R (1966) Experimenter effects in behavioral research. Appleton, New York Rosenthal R (1966) Experimenter effects in behavioral research. Appleton, New York
go back to reference Rosenthal R, Rosnow R (2007) Essentials of behavioral research: methods and data analysis. McGraw-Hill, 3rd edition Rosenthal R, Rosnow R (2007) Essentials of behavioral research: methods and data analysis. McGraw-Hill, 3rd edition
go back to reference Rosenthal R, Rubin DB (1978) Interpersonal expectancy effects: the first 345 studies. Behav Brain Sci 1(3):377–386CrossRef Rosenthal R, Rubin DB (1978) Interpersonal expectancy effects: the first 345 studies. Behav Brain Sci 1(3):377–386CrossRef
go back to reference Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? Shifting demographics in Mechanical Turk. ACM Conference on Human Factors in Computing Systems, 2863–2872 Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? Shifting demographics in Mechanical Turk. ACM Conference on Human Factors in Computing Systems, 2863–2872
go back to reference Rothermel KJ, Cook C, Burnett MM, Schonfeld J, Green TRG, Rothermel G (2000) WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation. ACM International Conference on Software Engineering, 230–239 Rothermel KJ, Cook C, Burnett MM, Schonfeld J, Green TRG, Rothermel G (2000) WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation. ACM International Conference on Software Engineering, 230–239
go back to reference Rubin J, Chisnell D (2008) Handbook of usability testing: how to plan, design, and conduct effective tests. Wiley Rubin J, Chisnell D (2008) Handbook of usability testing: how to plan, design, and conduct effective tests. Wiley
go back to reference Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164CrossRef Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164CrossRef
go back to reference Shull F, Singer J, Sjøberg DIK (2006) Guide to advanced empirical software engineering. Springer Shull F, Singer J, Sjøberg DIK (2006) Guide to advanced empirical software engineering. Springer
go back to reference Sillito J, Murphy G, De Volder K (2006) Questions programmers ask during software evolution tasks. ACM SIGSOFT/FSE, 23–34 Sillito J, Murphy G, De Volder K (2006) Questions programmers ask during software evolution tasks. ACM SIGSOFT/FSE, 23–34
go back to reference Sjoberg DIK, Dyba T, Jorgensen M (2007) The future of empirical methods in software engineering research. In 2007 Future of Software Engineering (FOSE ’07), 358–378 Sjoberg DIK, Dyba T, Jorgensen M (2007) The future of empirical methods in software engineering research. In 2007 Future of Software Engineering (FOSE ’07), 358–378
go back to reference Sjøberg D, Anda B, Arisholm E, Dyba T, Jorgensen M, Karahasanovic A, Koren E, Voka M (2003) Conducting realistic experiments in software engineering. Empirical Software Engineering and Measurement Sjøberg D, Anda B, Arisholm E, Dyba T, Jorgensen M, Karahasanovic A, Koren E, Voka M (2003) Conducting realistic experiments in software engineering. Empirical Software Engineering and Measurement
go back to reference Sjøberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanović A, Liborg N-K, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753CrossRef Sjøberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanović A, Liborg N-K, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753CrossRef
go back to reference Steele CM, Aronson J (1995) Stereotype threat and the intellectual test performance of African-Americans. J Personal Soc Psychol 69:797–811CrossRef Steele CM, Aronson J (1995) Stereotype threat and the intellectual test performance of African-Americans. J Personal Soc Psychol 69:797–811CrossRef
go back to reference Tichy WF (1998) Should computer scientists experiment more? 16 excuses to avoid experimentation. IEEE Comput 31(5):32–40CrossRef Tichy WF (1998) Should computer scientists experiment more? 16 excuses to avoid experimentation. IEEE Comput 31(5):32–40CrossRef
go back to reference Tichy WF, Lukowicz P, Prechelt L, Heinz EA (1995) Experimental evaluation in computer science: a quantitative study. J Syst Softw 28(1):9–18CrossRef Tichy WF, Lukowicz P, Prechelt L, Heinz EA (1995) Experimental evaluation in computer science: a quantitative study. J Syst Softw 28(1):9–18CrossRef
go back to reference Walther JB (2002) Research ethics in internet-enabled research: human subjects issues and methodological myopia. Ethics Inf Technol 4(3):205–216CrossRef Walther JB (2002) Research ethics in internet-enabled research: human subjects issues and methodological myopia. Ethics Inf Technol 4(3):205–216CrossRef
go back to reference Wickelgren WA (1977) Speed-accuracy tradeoff and information processing dynamics. Acta Psychol 41(1):67–85CrossRef Wickelgren WA (1977) Speed-accuracy tradeoff and information processing dynamics. Acta Psychol 41(1):67–85CrossRef
go back to reference Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Springer Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Springer
go back to reference Yin RK (2003) Case study research: design and methods. Sage Publications Yin RK (2003) Case study research: design and methods. Sage Publications
go back to reference Zannier C, Melnik G, Maurer F (2006) On the success of empirical studies in the international conference on software engineering. ACM/IEEE International Conference on Software Engineering, 341–350 Zannier C, Melnik G, Maurer F (2006) On the success of empirical studies in the international conference on software engineering. ACM/IEEE International Conference on Software Engineering, 341–350
Metadata
Title
A practical guide to controlled experiments of software engineering tools with human participants
Authors
Amy J. Ko
Thomas D. LaToza
Margaret M. Burnett
Publication date
01-02-2015
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 1/2015
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-013-9279-3

Other articles of this Issue 1/2015

Empirical Software Engineering 1/2015 Go to the issue

Premium Partner