Skip to main content

2016 | OriginalPaper | Buchkapitel

13. Fair Statistical Communication in HCI

verfasst von : Pierre Dragicevic

Erschienen in: Modern Statistical Methods for HCI

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Statistics are tools to help end users accomplish their task. In research, to be qualified as usable, statistical tools should help researchers advance scientific knowledge by supporting and promoting the effective communication of research findings. Yet areas such as human-computer interaction (HCI) have adopted tools — i.e., p-values and dichotomous testing procedures — that have proven to be poor at supporting these tasks. The abusive use of these procedures has been severely criticized in a range of disciplines for several decades, suggesting that tools should be blamed, not end users. This chapter explains in a non-technical manner why it would be beneficial for HCI to switch to an estimation approach, i.e., reporting informative charts with effect sizes and interval estimates, and offering nuanced interpretations of our results. Advice is offered on how to communicate our empirical results in a clear, accurate, and transparent way without using any tests or p-values.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The width of confidence intervals generally increases with the variability of observations and decreases (somehow slowly) with sample size (Cumming 2012). So either pill 1 has a much more consistent effect or the number of subjects was remarkably larger. It is not very important here.
 
2
The term effect size is often used in a narrower sense to refer to standardized effect sizes (Coe 2002, see also Chap. 5). Although sometimes useful, reporting standardized effect sizes is not always necessary nor is it always recommended (Baguley 2009; Wilkinson 1999, p. 599).
 
3
Briefly, statistical power is the probability of correctly detecting an effect whose magnitude has been postulated in advance. The more participants, the larger the effect size and the lower the variability, the higher the statistical power (see also Chap. 5).
 
4
Strictly speaking, Neyman–Pearson ’s procedure involved choosing between the null hypothesis and an alternative hypothesis generally stating that the effect exists and takes some precise value. Accepting the null if the alternative hypothesis is true is a Type II error. Its frequentist probability is noted \(\beta \), and power is defined as \(1-\beta \). These notions are not important to the present discussion.
 
5
The sharp distinction between pills 2 and 3 is not a caricature. Due to Neyman–Pearson ’s heritage, even pointing out that a non-significant p-value is close to .05 is often considered a serious fault.
 
6
Since computing \(\beta \) (or the probability of a Type II error) requires assigning a precise value to the population mean, \(\beta \) is also very unlikely to correspond to an actual probability or error rate.
 
7
For elements of discussion concerning this particular dichotomy, see Stewart-Oaten (1995), Norman (2010), Velleman and Wilkinson (1993), Wierdsma (2013), Abelson (1995, Chap. 1) and Gigerenzer (2004, pp. 587–588).
 
8
The meaning of robust here differs from its use in robust statistics, where it refers to robustness to outliers and to departures from statistical assumptions.
 
9
There is considerable debate on how to best collect and analyze questionnaire data, and I have not gone through enough of the literature to provide definitive recommendations. Likert scales are easy to analyze if they are constructed adequately, i.e., by averaging responses from multiple question items (see Carifio and Perla 2007). If responses to individual items are of interest, it can be sufficient to report all responses visually (see Tip 22). Visual analogue scales seem to be a promising option to consider if inferences need to be made on individual items (Reips and Funke 2008). However, analyzing many items individually is not recommended (see Tips 1, 5 and 30).
 
10
Both types of inferences can be combined using hierarchical or multi-level models, and tools exist for computing hierarchical confidence intervals (see Chap. 11).
 
11
For more on the important concepts of sampling distribution and the central limit theorem, see, e.g., Cumming (2013, Chap. 3) and the applet at http://​tinyurl.​com/​sdsim.
 
12
Visual robustness is related to the concept of visual-data correspondence recently introduced in infovis (Kindlmann and Scheidegger 2014). The counterpart of robustness (i.e., a visualization’s ability to reveal differences in data) has been variously termed distinctness (Rensink 2014), power (Hofmann et al. 2012), and unambiguity (Kindlmann and Scheidegger 2014).
 
Literatur
Zurück zum Zitat Abelson R (1995) Statistics as principled argument. Lawrence Erlbaum Associates Abelson R (1995) Statistics as principled argument. Lawrence Erlbaum Associates
Zurück zum Zitat Abelson RP (1997) A retrospective on the significance test ban of 1999. What if there were no significance tests. pp 117–141 Abelson RP (1997) A retrospective on the significance test ban of 1999. What if there were no significance tests. pp 117–141
Zurück zum Zitat APA (2010) The publication manual of the APA, 6th edn. Washington, DC APA (2010) The publication manual of the APA, 6th edn. Washington, DC
Zurück zum Zitat Baguley T (2009) Standardized or simple effect size: what should be reported? Br J Psychol 100(3):603–617CrossRef Baguley T (2009) Standardized or simple effect size: what should be reported? Br J Psychol 100(3):603–617CrossRef
Zurück zum Zitat Baguley T (2012) Calculating and graphing within-subject confidence intervals for ANOVA. Behav Res Meth 44(1):158–175CrossRef Baguley T (2012) Calculating and graphing within-subject confidence intervals for ANOVA. Behav Res Meth 44(1):158–175CrossRef
Zurück zum Zitat Bayarri MJ, Berger JO (2004) The interplay of Bayesian and frequentist analysis. Stat Sci 58–80 Bayarri MJ, Berger JO (2004) The interplay of Bayesian and frequentist analysis. Stat Sci 58–80
Zurück zum Zitat Beaudouin-Lafon M (2008) Interaction is the future of computing. In: McDonald DW, Erickson T (eds) HCI remixed, reflections on works that have influenced the HCI community. The MIT Press, pp 263–266 Beaudouin-Lafon M (2008) Interaction is the future of computing. In: McDonald DW, Erickson T (eds) HCI remixed, reflections on works that have influenced the HCI community. The MIT Press, pp 263–266
Zurück zum Zitat Bender R, Lange S (2001) Adjusting for multiple testing: when and how? J Clin Epidemiol 54(4):343–349CrossRef Bender R, Lange S (2001) Adjusting for multiple testing: when and how? J Clin Epidemiol 54(4):343–349CrossRef
Zurück zum Zitat Beyth-Marom R, Fidler F, Cumming G (2008) Statistical cognition: towards evidence-based practice in statistics and statistics education. Stat Educ Res J 7(2):20–39 Beyth-Marom R, Fidler F, Cumming G (2008) Statistical cognition: towards evidence-based practice in statistics and statistics education. Stat Educ Res J 7(2):20–39
Zurück zum Zitat Brewer MB (2000) Research design and issues of validity. Handbook of research methods in social and personality psychology. pp 3–16 Brewer MB (2000) Research design and issues of validity. Handbook of research methods in social and personality psychology. pp 3–16
Zurück zum Zitat Brodeur A, Lé M, Sangnier M, Zylberberg Y (2012) Star wars: the empirics strike back. Paris school of economics working paper (2012–29) Brodeur A, Lé M, Sangnier M, Zylberberg Y (2012) Star wars: the empirics strike back. Paris school of economics working paper (2012–29)
Zurück zum Zitat Carifio J, Perla RJ (2007) Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. J Soc Sci 3(3):106 Carifio J, Perla RJ (2007) Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. J Soc Sci 3(3):106
Zurück zum Zitat Chevalier F, Dragicevic P, Franconeri S (2014) The not-so-staggering effect of staggered animated transitions on visual tracking. IEEE Trans Visual Comput Graphics 20(12):2241–2250CrossRef Chevalier F, Dragicevic P, Franconeri S (2014) The not-so-staggering effect of staggered animated transitions on visual tracking. IEEE Trans Visual Comput Graphics 20(12):2241–2250CrossRef
Zurück zum Zitat Coe R (2002) It’s the effect size, stupid. In: Paper presented at the British Educational Research Association annual conference, vol 12. p 14 Coe R (2002) It’s the effect size, stupid. In: Paper presented at the British Educational Research Association annual conference, vol 12. p 14
Zurück zum Zitat Cohen J (1990) Things I have learned (so far). Am Psychol 45(12):1304CrossRef Cohen J (1990) Things I have learned (so far). Am Psychol 45(12):1304CrossRef
Zurück zum Zitat Cohen J (1994) The Earth is round (p < .05). Am psychol 49(12):997 Cohen J (1994) The Earth is round (p < .05). Am psychol 49(12):997
Zurück zum Zitat Colquhoun D (2014) An investigation of the false discovery rate and the misinterpretation of p-values. Roy Soc Open Sci 1(3):140, 216 Colquhoun D (2014) An investigation of the false discovery rate and the misinterpretation of p-values. Roy Soc Open Sci 1(3):140, 216
Zurück zum Zitat Correll M, Gleicher M (2014) Error bars considered harmful: exploring alternate encodings for mean and error. IEEE Trans Visual Comput Graphics 20(12):2142–2151CrossRef Correll M, Gleicher M (2014) Error bars considered harmful: exploring alternate encodings for mean and error. IEEE Trans Visual Comput Graphics 20(12):2142–2151CrossRef
Zurück zum Zitat Cumming G (2008) Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspect Psychol Sci 3(4):286–300MathSciNetCrossRef Cumming G (2008) Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspect Psychol Sci 3(4):286–300MathSciNetCrossRef
Zurück zum Zitat Cumming G (2009b) Inference by eye: reading the overlap of independent confidence intervals. Stat med 28(2):205–220MathSciNetCrossRef Cumming G (2009b) Inference by eye: reading the overlap of independent confidence intervals. Stat med 28(2):205–220MathSciNetCrossRef
Zurück zum Zitat Cumming G (2012) Understanding the new statistics : effect sizes, confidence intervals, and meta-analysis. Multivariate applications series. Routledge Academic, London Cumming G (2012) Understanding the new statistics : effect sizes, confidence intervals, and meta-analysis. Multivariate applications series. Routledge Academic, London
Zurück zum Zitat Cumming G (2013) The new statistics: why and how. Psychol Sci Cumming G (2013) The new statistics: why and how. Psychol Sci
Zurück zum Zitat Cumming G, Finch S (2005) Inference by eye: confidence intervals and how to read pictures of data. Am Psychol 60(2):170CrossRef Cumming G, Finch S (2005) Inference by eye: confidence intervals and how to read pictures of data. Am Psychol 60(2):170CrossRef
Zurück zum Zitat Cumming G, Fidler F, Vaux DL (2007) Error bars in experimental biology. J Cell Biol 177(1):7–11CrossRef Cumming G, Fidler F, Vaux DL (2007) Error bars in experimental biology. J Cell Biol 177(1):7–11CrossRef
Zurück zum Zitat Dawkins R (2011) The tyranny of the discontinuous mind. New Statesman 19:54–57 Dawkins R (2011) The tyranny of the discontinuous mind. New Statesman 19:54–57
Zurück zum Zitat Dienes Z (2014) Using Bayes to get the most out of non-significant results. Front Psychol 5 Dienes Z (2014) Using Bayes to get the most out of non-significant results. Front Psychol 5
Zurück zum Zitat Dragicevic P (2012) My technique is 20% faster: problems with reports of speed improvements in HCI. Research report Dragicevic P (2012) My technique is 20% faster: problems with reports of speed improvements in HCI. Research report
Zurück zum Zitat Dragicevic P, Chevalier F, Huot S (2014) Running an HCI experiment in multiple parallel universes. CHI extended abstracts. ACM, New York Dragicevic P, Chevalier F, Huot S (2014) Running an HCI experiment in multiple parallel universes. CHI extended abstracts. ACM, New York
Zurück zum Zitat Drummond GB, Vowler SL (2011) Show the data, don’t conceal them. Adv Physiol Educ 35(2):130–132CrossRef Drummond GB, Vowler SL (2011) Show the data, don’t conceal them. Adv Physiol Educ 35(2):130–132CrossRef
Zurück zum Zitat Duckworth WM, Stephenson WR (2003) Resampling methods: not just for statisticians anymore. In: 2003 joint statistical meetings Duckworth WM, Stephenson WR (2003) Resampling methods: not just for statisticians anymore. In: 2003 joint statistical meetings
Zurück zum Zitat Ecklund A (2012) Beeswarm: the bee swarm plot, an alternative to stripchart. R package version 01 Ecklund A (2012) Beeswarm: the bee swarm plot, an alternative to stripchart. R package version 01
Zurück zum Zitat Fekete JD, Van Wijk JJ, Stasko JT, North C (2008) The value of information visualization. In: Information visualization. Springer, pp 1–18 Fekete JD, Van Wijk JJ, Stasko JT, North C (2008) The value of information visualization. In: Information visualization. Springer, pp 1–18
Zurück zum Zitat Fidler F (2010) The american psychological association publication manual, 6th edn. Implications for statistics education. In: Data and context in statistics education: towards an evidence based society Fidler F (2010) The american psychological association publication manual, 6th edn. Implications for statistics education. In: Data and context in statistics education: towards an evidence based society
Zurück zum Zitat Fidler F, Cumming G (2005) Teaching confidence intervals: problems and potential solutions. In: Proceedings of the 55th international statistics institute session Fidler F, Cumming G (2005) Teaching confidence intervals: problems and potential solutions. In: Proceedings of the 55th international statistics institute session
Zurück zum Zitat Fidler F, Loftus GR (2009) Why figures with error bars should replace p values. Zeitschrift für Psychologie/J Psychol 217(1):27–37CrossRef Fidler F, Loftus GR (2009) Why figures with error bars should replace p values. Zeitschrift für Psychologie/J Psychol 217(1):27–37CrossRef
Zurück zum Zitat Fisher R (1955) Statistical methods and scientific induction. J Roy Stat Soc Ser B (Methodol): 69–78 Fisher R (1955) Statistical methods and scientific induction. J Roy Stat Soc Ser B (Methodol): 69–78
Zurück zum Zitat Franz VH, Loftus GR (2012) Standard errors and confidence intervals in within-subjects designs: generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts. Psychon Bull Rev 19(3):395–404CrossRef Franz VH, Loftus GR (2012) Standard errors and confidence intervals in within-subjects designs: generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts. Psychon Bull Rev 19(3):395–404CrossRef
Zurück zum Zitat Frick RW (1998) Interpreting statistical testing: process and propensity, not population and random sampling. Behav Res Meth Instrum Comput 30(3):527–535MathSciNetCrossRef Frick RW (1998) Interpreting statistical testing: process and propensity, not population and random sampling. Behav Res Meth Instrum Comput 30(3):527–535MathSciNetCrossRef
Zurück zum Zitat Gardner MJ, Altman DG (1986) Confidence intervals rather than p values: estimation rather than hypothesis testing. BMJ 292(6522):746–750CrossRef Gardner MJ, Altman DG (1986) Confidence intervals rather than p values: estimation rather than hypothesis testing. BMJ 292(6522):746–750CrossRef
Zurück zum Zitat Gelman A (2013a) Commentary: p-values and statistical practice. Epidemiology 24(1):69–72CrossRef Gelman A (2013a) Commentary: p-values and statistical practice. Epidemiology 24(1):69–72CrossRef
Zurück zum Zitat Gelman A, Loken E (2013) The garden of forking paths. Online article Gelman A, Loken E (2013) The garden of forking paths. Online article
Zurück zum Zitat Gelman A, Stern H (2006) The difference between significant and not significant is not itself statistically significant. Am Stat 60(4):328–331MathSciNetCrossRef Gelman A, Stern H (2006) The difference between significant and not significant is not itself statistically significant. Am Stat 60(4):328–331MathSciNetCrossRef
Zurück zum Zitat Gigerenzer G (2004) Mindless statistics. J Socio Econ 33(5):587–606CrossRef Gigerenzer G (2004) Mindless statistics. J Socio Econ 33(5):587–606CrossRef
Zurück zum Zitat Gigerenzer G, Kruger L, Beatty J, Porter T, Daston L, Swijtink Z (1990) The empire of chance: how probability changed science and everyday life, vol 12. Cambridge University Press Gigerenzer G, Kruger L, Beatty J, Porter T, Daston L, Swijtink Z (1990) The empire of chance: how probability changed science and everyday life, vol 12. Cambridge University Press
Zurück zum Zitat Giner-Sorolla R (2012) Science or art? how aesthetic standards grease the way through the publication bottleneck but undermine science. Perspect Psychol Sci 7(6):562–571CrossRef Giner-Sorolla R (2012) Science or art? how aesthetic standards grease the way through the publication bottleneck but undermine science. Perspect Psychol Sci 7(6):562–571CrossRef
Zurück zum Zitat Gliner JA, Leech NL, Morgan GA (2002) Problems with null hypothesis significance testing (NHST): what do the textbooks say? J Exp Educ 71(1):83–92CrossRef Gliner JA, Leech NL, Morgan GA (2002) Problems with null hypothesis significance testing (NHST): what do the textbooks say? J Exp Educ 71(1):83–92CrossRef
Zurück zum Zitat Goodman SN (1999) Toward evidence-based medical statistics. 1: the p value fallacy. Ann Intern Med 130(12):995–1004CrossRef Goodman SN (1999) Toward evidence-based medical statistics. 1: the p value fallacy. Ann Intern Med 130(12):995–1004CrossRef
Zurück zum Zitat Greenland S, Poole C (2013) Living with p values: resurrecting a Bayesian perspective on frequentist statistics. Epidemiology 24(1):62–68CrossRef Greenland S, Poole C (2013) Living with p values: resurrecting a Bayesian perspective on frequentist statistics. Epidemiology 24(1):62–68CrossRef
Zurück zum Zitat Hager W (2002) The examination of psychological hypotheses by planned contrasts referring to two-factor interactions in fixed-effects ANOVA. Method Psychol Res, Online 7:49–77MathSciNet Hager W (2002) The examination of psychological hypotheses by planned contrasts referring to two-factor interactions in fixed-effects ANOVA. Method Psychol Res, Online 7:49–77MathSciNet
Zurück zum Zitat Haller H, Krauss S (2002) Misinterpretations of significance: a problem students share with their teachers. Methods Psychol Res 7(1):1–20 Haller H, Krauss S (2002) Misinterpretations of significance: a problem students share with their teachers. Methods Psychol Res 7(1):1–20
Zurück zum Zitat Hoekstra R, Finch S, Kiers HA, Johnson A (2006) Probability as certainty: dichotomous thinking and the misuse of p values. Psychon Bull Rev 13(6):1033–1037CrossRef Hoekstra R, Finch S, Kiers HA, Johnson A (2006) Probability as certainty: dichotomous thinking and the misuse of p values. Psychon Bull Rev 13(6):1033–1037CrossRef
Zurück zum Zitat Hofmann H, Follett L, Majumder M, Cook D (2012) Graphical tests for power comparison of competing designs. IEEE Trans Visual Comput Graphics 18(12):2441–2448CrossRef Hofmann H, Follett L, Majumder M, Cook D (2012) Graphical tests for power comparison of competing designs. IEEE Trans Visual Comput Graphics 18(12):2441–2448CrossRef
Zurück zum Zitat Hornbæk K, Sander SS, Bargas-Avila JA, Grue Simonsen J (2014) Is once enough?: on the extent and content of replications in human-computer interaction. In: Proceedings of ACM, ACM conference on human factors in computing systems, pp 3523–3532 Hornbæk K, Sander SS, Bargas-Avila JA, Grue Simonsen J (2014) Is once enough?: on the extent and content of replications in human-computer interaction. In: Proceedings of ACM, ACM conference on human factors in computing systems, pp 3523–3532
Zurück zum Zitat Jansen Y (2014) Physical and tangible information visualization. PhD thesis, Université Paris Sud-Paris XI Jansen Y (2014) Physical and tangible information visualization. PhD thesis, Université Paris Sud-Paris XI
Zurück zum Zitat Kaptein M, Robertson J (2012) Rethinking statistical analysis methods for CHI. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 1105–1114 Kaptein M, Robertson J (2012) Rethinking statistical analysis methods for CHI. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 1105–1114
Zurück zum Zitat Keene ON (1995) The log transformation is special. Stat Med 14(8):811–819CrossRef Keene ON (1995) The log transformation is special. Stat Med 14(8):811–819CrossRef
Zurück zum Zitat Kerr NL (1998) HARKing: hypothesizing after the results are known. Pers Soc Psychol Rev 2(3):196–217CrossRef Kerr NL (1998) HARKing: hypothesizing after the results are known. Pers Soc Psychol Rev 2(3):196–217CrossRef
Zurück zum Zitat Kindlmann G, Scheidegger C (2014) An algebraic process for visualization design. IEEE Trans Visual Comput Graphics 20(12):2181–2190CrossRef Kindlmann G, Scheidegger C (2014) An algebraic process for visualization design. IEEE Trans Visual Comput Graphics 20(12):2181–2190CrossRef
Zurück zum Zitat Kirby KN, Gerlanc D (2013) BootES: an R package for bootstrap confidence intervals on effect sizes. Behav Res Methods 45(4):905–927CrossRef Kirby KN, Gerlanc D (2013) BootES: an R package for bootstrap confidence intervals on effect sizes. Behav Res Methods 45(4):905–927CrossRef
Zurück zum Zitat Kline RB (2004) What’s wrong with statistical tests–and where we go from here. Am Psychol Assoc Kline RB (2004) What’s wrong with statistical tests–and where we go from here. Am Psychol Assoc
Zurück zum Zitat Lambdin C (2012) Significance tests as sorcery: science is empirical, significance tests are not. Theory Psychol 22(1):67–90CrossRef Lambdin C (2012) Significance tests as sorcery: science is empirical, significance tests are not. Theory Psychol 22(1):67–90CrossRef
Zurück zum Zitat Lazic SE (2010) The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neurosci 11(1):5CrossRef Lazic SE (2010) The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neurosci 11(1):5CrossRef
Zurück zum Zitat Levine TR, Weber R, Hullett C, Park HS, Lindsey LLM (2008a) A critical assessment of null hypothesis significance testing in quantitative communication research. Hum Commun Res 34(2):171–187CrossRef Levine TR, Weber R, Hullett C, Park HS, Lindsey LLM (2008a) A critical assessment of null hypothesis significance testing in quantitative communication research. Hum Commun Res 34(2):171–187CrossRef
Zurück zum Zitat Levine TR, Weber R, Park HS, Hullett CR (2008b) A communication researchers’ guide to null hypothesis significance testing and alternatives. Hum Commun Res 34(2):188–209CrossRef Levine TR, Weber R, Park HS, Hullett CR (2008b) A communication researchers’ guide to null hypothesis significance testing and alternatives. Hum Commun Res 34(2):188–209CrossRef
Zurück zum Zitat Loftus GR (1993) A picture is worth a thousand p values: on the irrelevance of hypothesis testing in the microcomputer age. Behav Res Meth Instrum Comput 25(2):250–256CrossRef Loftus GR (1993) A picture is worth a thousand p values: on the irrelevance of hypothesis testing in the microcomputer age. Behav Res Meth Instrum Comput 25(2):250–256CrossRef
Zurück zum Zitat MacCallum RC, Zhang S, Preacher KJ, Rucker DD (2002) On the practice of dichotomization of quantitative variables. Psychol Methods 7(1):19CrossRef MacCallum RC, Zhang S, Preacher KJ, Rucker DD (2002) On the practice of dichotomization of quantitative variables. Psychol Methods 7(1):19CrossRef
Zurück zum Zitat Mazar N, Amir O, Ariely D (2008) The dishonesty of honest people: a theory of self-concept maintenance. J Mark Res 45(6):633–644CrossRef Mazar N, Amir O, Ariely D (2008) The dishonesty of honest people: a theory of self-concept maintenance. J Mark Res 45(6):633–644CrossRef
Zurück zum Zitat Meehl PE (1967) Theory-testing in psychology and physics: a methodological paradox. Philos Sci: 103–115 Meehl PE (1967) Theory-testing in psychology and physics: a methodological paradox. Philos Sci: 103–115
Zurück zum Zitat Miller J (1991) Short report: reaction time analysis with outlier exclusion: bias varies with sample size. Q J Exp Psychol 43(4):907–912CrossRef Miller J (1991) Short report: reaction time analysis with outlier exclusion: bias varies with sample size. Q J Exp Psychol 43(4):907–912CrossRef
Zurück zum Zitat Newcombe RG (1998a) Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 17(8):873–890CrossRef Newcombe RG (1998a) Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 17(8):873–890CrossRef
Zurück zum Zitat Newcombe RG (1998b) Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 17(8):857–872CrossRef Newcombe RG (1998b) Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 17(8):857–872CrossRef
Zurück zum Zitat Newman GE, Scholl BJ (2012) Bar graphs depicting averages are perceptually misinterpreted: the within-the-bar bias. Psychon Bull Rev 19(4):601–607CrossRef Newman GE, Scholl BJ (2012) Bar graphs depicting averages are perceptually misinterpreted: the within-the-bar bias. Psychon Bull Rev 19(4):601–607CrossRef
Zurück zum Zitat Norman DA (2002) The Design of Everyday Things. Basic Books Inc, New York Norman DA (2002) The Design of Everyday Things. Basic Books Inc, New York
Zurück zum Zitat Norman G (2010) Likert scales, levels of measurement and the laws of statistics. Adv Health Sci Educ 15(5):625–632CrossRef Norman G (2010) Likert scales, levels of measurement and the laws of statistics. Adv Health Sci Educ 15(5):625–632CrossRef
Zurück zum Zitat Nuzzo R (2014) Scientific method: statistical errors. Nature 506(7487):150–152CrossRef Nuzzo R (2014) Scientific method: statistical errors. Nature 506(7487):150–152CrossRef
Zurück zum Zitat Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716+ Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716+
Zurück zum Zitat Osborne JW, Overbay A (2004) The power of outliers (and why researchers should always check for them). Pract Asses Res Eval 9(6):1–12 Osborne JW, Overbay A (2004) The power of outliers (and why researchers should always check for them). Pract Asses Res Eval 9(6):1–12
Zurück zum Zitat Perin C, Dragicevic P, Fekete JD (2014) Revisiting Bertin matrices: new interactions for crafting tabular visualizations. IEEE Trans Visual Comput Graphics 20(12):2082–2091CrossRef Perin C, Dragicevic P, Fekete JD (2014) Revisiting Bertin matrices: new interactions for crafting tabular visualizations. IEEE Trans Visual Comput Graphics 20(12):2082–2091CrossRef
Zurück zum Zitat Pollard P, Richardson J (1987) On the probability of making Type I errors. Psychol Bull 102(1):159CrossRef Pollard P, Richardson J (1987) On the probability of making Type I errors. Psychol Bull 102(1):159CrossRef
Zurück zum Zitat Rawls RL (1998) Breaking up is hard to do. Chem Eng News 76(25):29–34CrossRef Rawls RL (1998) Breaking up is hard to do. Chem Eng News 76(25):29–34CrossRef
Zurück zum Zitat Reips UD, Funke F (2008) Interval-level measurement with visual analogue scales in internet-based research: VAS generator. Behav Res Methods 40(3):699–704CrossRef Reips UD, Funke F (2008) Interval-level measurement with visual analogue scales in internet-based research: VAS generator. Behav Res Methods 40(3):699–704CrossRef
Zurück zum Zitat Rensink RA (2014) On the prospects for a science of visualization. In: Handbook of Human Centric Visualization. Springer, pp 147–175 Rensink RA (2014) On the prospects for a science of visualization. In: Handbook of Human Centric Visualization. Springer, pp 147–175
Zurück zum Zitat Ricketts C, Berry J (1994) Teaching statistics through resampling. Teach Stat 16(2):41–44CrossRef Ricketts C, Berry J (1994) Teaching statistics through resampling. Teach Stat 16(2):41–44CrossRef
Zurück zum Zitat Rosenthal R (2009) Artifacts in behavioral research: Robert Rosenthal and Ralph L. Rosnow’s Classic Books. Oxford University Press, OxfordCrossRef Rosenthal R (2009) Artifacts in behavioral research: Robert Rosenthal and Ralph L. Rosnow’s Classic Books. Oxford University Press, OxfordCrossRef
Zurück zum Zitat Rosenthal R, Fode KL (1963) The effect of experimenter bias on the performance of the albino rat. Behav Sci 8(3):183–189CrossRef Rosenthal R, Fode KL (1963) The effect of experimenter bias on the performance of the albino rat. Behav Sci 8(3):183–189CrossRef
Zurück zum Zitat Rosnow RL, Rosenthal R (1989) Statistical procedures and the justification of knowledge in psychological science. Am Psychol 44(10):1276CrossRef Rosnow RL, Rosenthal R (1989) Statistical procedures and the justification of knowledge in psychological science. Am Psychol 44(10):1276CrossRef
Zurück zum Zitat Rossi JS (1990) Statistical power of psychological research: what have we gained in 20 years? J Consult Clin Psychol 58(5):646CrossRef Rossi JS (1990) Statistical power of psychological research: what have we gained in 20 years? J Consult Clin Psychol 58(5):646CrossRef
Zurück zum Zitat Sauro J, Lewis JR (2010) Average task times in usability tests: what to report? In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 2347–2350 Sauro J, Lewis JR (2010) Average task times in usability tests: what to report? In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 2347–2350
Zurück zum Zitat Schmidt FL, Hunter J (1997) Eight common but false objections to the discontinuation of significance testing in the analysis of research data. What if there were no significance tests. pp 37–64 Schmidt FL, Hunter J (1997) Eight common but false objections to the discontinuation of significance testing in the analysis of research data. What if there were no significance tests. pp 37–64
Zurück zum Zitat Simmons JP, Nelson LD, Simonsohn U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci 22(11):1359–1366CrossRef Simmons JP, Nelson LD, Simonsohn U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci 22(11):1359–1366CrossRef
Zurück zum Zitat Smith RA, Levine TR, Lachlan KA, Fediuk TA (2002) The high cost of complexity in experimental design and data analysis: type I and type II error rates in multiway ANOVA. Hum Commun Res 28(4):515–530CrossRef Smith RA, Levine TR, Lachlan KA, Fediuk TA (2002) The high cost of complexity in experimental design and data analysis: type I and type II error rates in multiway ANOVA. Hum Commun Res 28(4):515–530CrossRef
Zurück zum Zitat Stewart-Oaten A (1995) Rules and judgments in statistics: three examples. Ecology: 2001–2009 Stewart-Oaten A (1995) Rules and judgments in statistics: three examples. Ecology: 2001–2009
Zurück zum Zitat Thompson B (1998) Statistical significance and effect size reporting: portrait of a possible future. Res Sch 5(2):33–38 Thompson B (1998) Statistical significance and effect size reporting: portrait of a possible future. Res Sch 5(2):33–38
Zurück zum Zitat Thompson B (1999) Statistical significance tests, effect size reporting and the vain pursuit of pseudo-objectivity. Theory Psychol 9(2):191–196CrossRef Thompson B (1999) Statistical significance tests, effect size reporting and the vain pursuit of pseudo-objectivity. Theory Psychol 9(2):191–196CrossRef
Zurück zum Zitat Tryon WW (2001) Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: an integrated alternative method of conducting null hypothesis statistical tests. Psychol Methods 6(4):371CrossRef Tryon WW (2001) Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: an integrated alternative method of conducting null hypothesis statistical tests. Psychol Methods 6(4):371CrossRef
Zurück zum Zitat Tukey JW (1980) We need both exploratory and confirmatory. Am Stat 34(1):23–25 Tukey JW (1980) We need both exploratory and confirmatory. Am Stat 34(1):23–25
Zurück zum Zitat Ulrich R, Miller J (1994) Effects of truncation on reaction time analysis. J Exp Psychol: Gen 123(1):34CrossRef Ulrich R, Miller J (1994) Effects of truncation on reaction time analysis. J Exp Psychol: Gen 123(1):34CrossRef
Zurück zum Zitat van Deemter K (2010) Not exactly: in praise of vagueness. Oxford University Press, Oxford van Deemter K (2010) Not exactly: in praise of vagueness. Oxford University Press, Oxford
Zurück zum Zitat Velleman PF, Wilkinson L (1993) Nominal, ordinal, interval, and ratio typologies are misleading. Am Stat 47(1):65–72 Velleman PF, Wilkinson L (1993) Nominal, ordinal, interval, and ratio typologies are misleading. Am Stat 47(1):65–72
Zurück zum Zitat Vicente KJ, Torenvliet GL (2000) The Earth is spherical (p < 0.05): alternative methods of statistical inference. Theor Issues Ergon Sci 1(3):248–271 Vicente KJ, Torenvliet GL (2000) The Earth is spherical (p < 0.05): alternative methods of statistical inference. Theor Issues Ergon Sci 1(3):248–271
Zurück zum Zitat Wickham H, Stryjewski L (2011) 40 years of boxplots. Am Stat Wickham H, Stryjewski L (2011) 40 years of boxplots. Am Stat
Zurück zum Zitat Wilcox RR (1998) How many discoveries have been lost by ignoring modern statistical methods? Am Psychol 53(3):300CrossRef Wilcox RR (1998) How many discoveries have been lost by ignoring modern statistical methods? Am Psychol 53(3):300CrossRef
Zurück zum Zitat Wilkinson L (1999) Statistical methods in psychology journals: guidelines and explanations. Am Psychol 54(8):594CrossRef Wilkinson L (1999) Statistical methods in psychology journals: guidelines and explanations. Am Psychol 54(8):594CrossRef
Zurück zum Zitat Willett W, Jenny B, Isenberg T, Dragicevic P (2015) Lightweight relief shearing for enhanced terrain perception on interactive maps. In: Proceedings of ACM conference on human factors in computing systems. ACM, New York, NY, USA, CHI ’15, pp 3563–3572 Willett W, Jenny B, Isenberg T, Dragicevic P (2015) Lightweight relief shearing for enhanced terrain perception on interactive maps. In: Proceedings of ACM conference on human factors in computing systems. ACM, New York, NY, USA, CHI ’15, pp 3563–3572
Zurück zum Zitat Wilson W (1962) A note on the inconsistency inherent in the necessity to perform multiple comparisons. Psychol Bull 59(4):296CrossRef Wilson W (1962) A note on the inconsistency inherent in the necessity to perform multiple comparisons. Psychol Bull 59(4):296CrossRef
Zurück zum Zitat Wood M (2005) Bootstrapped confidence intervals as an approach to statistical inference. Organ Res Meth 8(4):454–470CrossRef Wood M (2005) Bootstrapped confidence intervals as an approach to statistical inference. Organ Res Meth 8(4):454–470CrossRef
Zurück zum Zitat Zacks J, Tversky B (1999) Bars and lines: a study of graphic communication. Mem Cogn 27(6):1073–1079CrossRef Zacks J, Tversky B (1999) Bars and lines: a study of graphic communication. Mem Cogn 27(6):1073–1079CrossRef
Zurück zum Zitat Ziliak ST, McCloskey DN (2008) The cult of statistical significance. University of Michigan Press, Ann Arbor Ziliak ST, McCloskey DN (2008) The cult of statistical significance. University of Michigan Press, Ann Arbor
Metadaten
Titel
Fair Statistical Communication in HCI
verfasst von
Pierre Dragicevic
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-26633-6_13

Neuer Inhalt