Skip to main content
Top
Published in: International Journal of Data Science and Analytics 2/2017

15-05-2017 | Regular Paper

Caveats and pitfalls in crowdsourcing research: the case of soccer referee bias

Authors: Daniel Berrar, Philippe Lopes, Werner Dubitzky

Published in: International Journal of Data Science and Analytics | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In a recent crowdsourcing project, 29 teams analyzed the same data set to address the following question: “Are football (soccer) referees more likely to give red cards to players with dark skin tone than to players with light skin tone?” The major finding was that the results of the individual teams varied widely, from no effect to highly significant correlations between skin color and the rate of red cards, which some teams interpreted as indicative of a referee bias. We analyzed the same data using a Poisson log-linear regression model and obtained an odds ratio of 1.34 (95%-CI, 1.13–1.59), which means that players with a darker skin tone have in fact a slightly higher odds of receiving a red card. This result is in agreement with the median odds ratio of 1.31 from all 29 teams. We then extended the original study by investigating the likelihood of receiving yellow cards. If a referee bias was in fact present, it would be plausible to see a similar association. However, players with darker skin tone were significantly less likely to receive a yellow card, with an odds ratio of 0.94 (95%-CI, 0.91–0.97). The risk of receiving a card is most strongly affected by a player’s position, and there are significantly more players with darker skin tone at center back and defensive midfield where receiving red cards is generally more likely. Taken together, our results do not support the hypothesis of a referee bias. Our most important finding, however, is that the perceived diversity of results from the crowdsourcing teams is due to placing too much emphasis on dichotomous decisions (significant vs. nonsignificant). When we focus on point estimates and their reasonable bounds, the individual substudies predominantly reinforce each other. We argue that data scientists should put less emphasis on statistical significance and instead focus more on the careful interpretation of confidence intervals or alternative methods for measuring the effect size and its precision.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
4.
go back to reference Berrar, D., Granzow, M., Dubitzky, W.: Introduction to genomic and proteomic data analysis. In: Dubitzky, W., Granzow, M., Berrar, D. (eds.) Fundamentals of Data Mining in Genomics and Proteomics, pp. 1–37. Springer, Berlin (2007) Berrar, D., Granzow, M., Dubitzky, W.: Introduction to genomic and proteomic data analysis. In: Dubitzky, W., Granzow, M., Berrar, D. (eds.) Fundamentals of Data Mining in Genomics and Proteomics, pp. 1–37. Springer, Berlin (2007)
5.
go back to reference Berrar, D., Lozano, J.: Significance tests or confidence intervals: which are preferable for the comparison of classifiers? Journal of Experimental and Theoretical Artificial Intelligence 25(2), 189–206 (2013)CrossRef Berrar, D., Lozano, J.: Significance tests or confidence intervals: which are preferable for the comparison of classifiers? Journal of Experimental and Theoretical Artificial Intelligence 25(2), 189–206 (2013)CrossRef
7.
go back to reference Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall, London (1984)MATH Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall, London (1984)MATH
8.
go back to reference Carver, R.: The case against statistical significance testing. Harv. Educ. Rev. 48(3), 378–399 (1978)CrossRef Carver, R.: The case against statistical significance testing. Harv. Educ. Rev. 48(3), 378–399 (1978)CrossRef
9.
go back to reference Cohen, J.: Things I have learned (so far). Am. Psychol. 45(12), 1304–1312 (1990)CrossRef Cohen, J.: Things I have learned (so far). Am. Psychol. 45(12), 1304–1312 (1990)CrossRef
10.
go back to reference Cohen, J.: The earth is round (\(p <\).05). Am. Psychol. 49(12), 997–1003 (1994)CrossRef Cohen, J.: The earth is round (\(p <\).05). Am. Psychol. 49(12), 997–1003 (1994)CrossRef
12.
go back to reference Cummings, G.: Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge, Taylor & Francis Group, New York/London (2012) Cummings, G.: Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge, Taylor & Francis Group, New York/London (2012)
13.
go back to reference Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L., Woloshin, S.: Helping doctors and patients to make sense of health statistics. Psychol. Sci. Public Interest 8(2), 53–96 (2008)CrossRef Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L., Woloshin, S.: Helping doctors and patients to make sense of health statistics. Psychol. Sci. Public Interest 8(2), 53–96 (2008)CrossRef
14.
go back to reference Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML (3), JMLR Proceedings, vol. 28, pp. 1319–1327. JMLR.org (2013) Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML (3), JMLR Proceedings, vol. 28, pp. 1319–1327. JMLR.org (2013)
15.
go back to reference Goodman, S., Royall, R.: Evidence and scientific research. Am. J. Public Health 78(12), 1568–1574 (1988)CrossRef Goodman, S., Royall, R.: Evidence and scientific research. Am. J. Public Health 78(12), 1568–1574 (1988)CrossRef
16.
go back to reference Guttman, L.: The illogic of statistical inference for cumulative science. Appl. Stoch. Models Data Anal. 1, 3–10 (1985)CrossRef Guttman, L.: The illogic of statistical inference for cumulative science. Appl. Stoch. Models Data Anal. 1, 3–10 (1985)CrossRef
17.
go back to reference Hubbard, R., Lindsay, R.: Why \(p\) values are not a useful measure of evidence in statistical significance testing. Theory Psychol. 18(1), 69–88 (2008)CrossRef Hubbard, R., Lindsay, R.: Why \(p\) values are not a useful measure of evidence in statistical significance testing. Theory Psychol. 18(1), 69–88 (2008)CrossRef
19.
go back to reference Morey, R., Wagenmakers, E.: Seven outliers produce the false impression of skin tone bias in soccer referees: A Bayesian logistic regression analysis (2015). https://osf.io/rkieb. Accessed 3 Nov 2016 Morey, R., Wagenmakers, E.: Seven outliers produce the false impression of skin tone bias in soccer referees: A Bayesian logistic regression analysis (2015). https://​osf.​io/​rkieb. Accessed 3 Nov 2016
21.
go back to reference Poole, C.: Low \(p\)-values or narrow confidence intervals: which are more durable? Epidemiology 12(3), 291–294 (2001)CrossRef Poole, C.: Low \(p\)-values or narrow confidence intervals: which are more durable? Epidemiology 12(3), 291–294 (2001)CrossRef
24.
go back to reference Rothman, K., Greenland, S., Lash, T.: Modern Epidemiology, 3rd edn. Wolters Kluwer, Alphen aan den Rijn (2008) Rothman, K., Greenland, S., Lash, T.: Modern Epidemiology, 3rd edn. Wolters Kluwer, Alphen aan den Rijn (2008)
25.
go back to reference Rozeboom, W.: The fallacy of the null hypothesis significance test. Psychol. Bull. 57, 416–428 (1960)CrossRef Rozeboom, W.: The fallacy of the null hypothesis significance test. Psychol. Bull. 57, 416–428 (1960)CrossRef
26.
go back to reference Schmidt, F.: Statistical significance testing and cumulative knowledge in psychology: implications for training of researchers. Psychol. Methods 1(2), 115–129 (1996)CrossRef Schmidt, F.: Statistical significance testing and cumulative knowledge in psychology: implications for training of researchers. Psychol. Methods 1(2), 115–129 (1996)CrossRef
27.
go back to reference Sheskin, D.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman and Hall, London/New York (2007)MATH Sheskin, D.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman and Hall, London/New York (2007)MATH
28.
go back to reference Silberzahn, R., Uhlmann, E.: Many hands make tight work. Nature 526, 189–191 (2015)CrossRef Silberzahn, R., Uhlmann, E.: Many hands make tight work. Nature 526, 189–191 (2015)CrossRef
29.
go back to reference Silberzahn, R., Uhlmann, E., Martin, D., Anselmi, P., Aust, F., Awtrey, E., Bahník, S., Bai, F., Bannard, C., Bonnier, E., Carlsson, R., Cheung, F., Christensen, G., Clay, R., Craig, M., Dalla Rosa, A., Dam, L., Evans, M., Flores Cervantes, I., Fong, N., Gamez-Djokic, M., Glenz, A., Gordon-McKeon, S., Heaton, T., Hederos Eriksson, K., Heene, M., Hofelich Mohr, A., Högden, F., Hui, K., Johannesson, M., Kalodimos, J., Kaszubowski, E., Kennedy, D., Lei, R., Lindsay, T., Liverani, S., Madan, C., Molden, D., Molleman, E., Morey, R., Mulder, L., Nijstad, B., Pope, N., Pope, B., Prenoveau, J., Rink, F., Robusto, E., Roderique, H., Sandberg, A., Schlüter, E., Schönbrodt, F., Sherman, M., Sommer, S., Sotak, K., Spain, S., Spörlein, C., Stafford, T., Stefanutti, L., Tauber, S., Ullrich, J., Vianello, M., Wagenmakers, E., Witkowiak, M., Yoon, S., Nosek, B.: Many analysts, one dataset: Making transparent how variations in analytical choices affect results https://osf.io/gvm2z. Accessed 3 Nov 2016 Silberzahn, R., Uhlmann, E., Martin, D., Anselmi, P., Aust, F., Awtrey, E., Bahník, S., Bai, F., Bannard, C., Bonnier, E., Carlsson, R., Cheung, F., Christensen, G., Clay, R., Craig, M., Dalla Rosa, A., Dam, L., Evans, M., Flores Cervantes, I., Fong, N., Gamez-Djokic, M., Glenz, A., Gordon-McKeon, S., Heaton, T., Hederos Eriksson, K., Heene, M., Hofelich Mohr, A., Högden, F., Hui, K., Johannesson, M., Kalodimos, J., Kaszubowski, E., Kennedy, D., Lei, R., Lindsay, T., Liverani, S., Madan, C., Molden, D., Molleman, E., Morey, R., Mulder, L., Nijstad, B., Pope, N., Pope, B., Prenoveau, J., Rink, F., Robusto, E., Roderique, H., Sandberg, A., Schlüter, E., Schönbrodt, F., Sherman, M., Sommer, S., Sotak, K., Spain, S., Spörlein, C., Stafford, T., Stefanutti, L., Tauber, S., Ullrich, J., Vianello, M., Wagenmakers, E., Witkowiak, M., Yoon, S., Nosek, B.: Many analysts, one dataset: Making transparent how variations in analytical choices affect results https://​osf.​io/​gvm2z. Accessed 3 Nov 2016
30.
go back to reference Stang, A., Poole, C., Kuss, O.: The ongoing tyranny of statistical significance testing in biomedical research. Eur. J. Epidemiol. 25, 225–230 (2010)CrossRef Stang, A., Poole, C., Kuss, O.: The ongoing tyranny of statistical significance testing in biomedical research. Eur. J. Epidemiol. 25, 225–230 (2010)CrossRef
32.
go back to reference Yates, F.: The influence of statistical methods for research workers on the development of the science of statistics. J. Am. Stat. Assoc. 46(253), 19–34 (1951) Yates, F.: The influence of statistical methods for research workers on the development of the science of statistics. J. Am. Stat. Assoc. 46(253), 19–34 (1951)
Metadata
Title
Caveats and pitfalls in crowdsourcing research: the case of soccer referee bias
Authors
Daniel Berrar
Philippe Lopes
Werner Dubitzky
Publication date
15-05-2017
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 2/2017
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0057-y

Other articles of this Issue 2/2017

International Journal of Data Science and Analytics 2/2017 Go to the issue

Premium Partner