Skip to main content
Top
Published in: Quality & Quantity 3/2014

01-05-2014

Intercoder reliability indices: disuse, misuse, and abuse

Author: Guangchao Charles Feng

Published in: Quality & Quantity | Issue 3/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Although intercoder reliability has been considered crucial to the validity of a content study, the choice among them has been controversial. This study analyzed all the content studies published in the two major communication journals that reported intercoder reliability, aiming to find how scholars conduct intercoder reliability test. The results revealed that some intercoder reliability indices were misused persistently concerning the levels of measurement, the number of coders, and the means of reporting reliability over the past 30 years. Implications of misuse, disuse, and abuse were discussed, and suggestions regarding proper choice of indices in various situations were made at last.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
\(^{1}\) Coders could be also called annotators, judges, raters, observers, classifiers and others, depending on the research field. Intercoder, as well as interrater, is used interchangeably throughout the paper.
 
2
When the reliability value is exceedingly lower than the value of percent agreement, e.g., percent agreement is higher than 0.8, while reliability is close or lower than 0, this may indicate that the marginal distribution is too skewed.
 
3
It is identical to Bennett et al. (1954)’s \(S\) coefficient.
 
4
As Lombard et al. (2002) argued, the proportion of percent agreement was probably underestimated because most “NAs” would actually adopt percent agreement.
 
5
They have corresponding multiple coder versions proposed by other scholars. For instance, Fleiss (1971) extended \(\pi \) while Conger (1980) and Light (1971) suggested the multiple coder version of \(\kappa \).
 
6
Cohen (1968) later proposed weighted \(\kappa \) for ordinal ratings. Krippendorff (2004a)’s \(\alpha \) is able to be applied to all levels of measurement. Some indices like ICCs are only applicable to interval ratings, and yet some like \(I_{r}\), Brennan and Prediger (1981)’s \(\kappa \) and \(\pi \) do not have higher levels of counterparts.
 
7
Although it has been a consensus that percent agreement, including Holsti generally overestimates reliability in that it does not make allowance for chance agreement, but it is not considered as misuse if used for nominal scaled codings. The rationale is to be explained below.
 
8
Reporting standard errors for the reliability value obtained is still arguable in the literature. Therefore, not reporting standard errors is not a problem for the present.
 
9
There are plenty of modeling approaches, such as log-linear, IRT (item response theory), latent class, and mixture modeling. In a separate study of the author, the approach of log-linear modeling was found to be no better than most indices.
 
10
Although variables with binary outcomes belong to the nominal level, most indices share more characteristics between binary and interval variables.
 
Literature
go back to reference Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)CrossRef Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)CrossRef
go back to reference Brennan, R., Prediger, D.: Coefficient kappa: some uses, misuses, and alternatives. Educ. Psychol. Meas. 41(3), 687 (1981)CrossRef Brennan, R., Prediger, D.: Coefficient kappa: some uses, misuses, and alternatives. Educ. Psychol. Meas. 41(3), 687 (1981)CrossRef
go back to reference Cicchetti, D., Feinstein, A.: High agreement but low kappa: II. Resolving the paradoxes* 1. J. Clin. Epidemiol. 43(6), 551–558 (1990)CrossRef Cicchetti, D., Feinstein, A.: High agreement but low kappa: II. Resolving the paradoxes* 1. J. Clin. Epidemiol. 43(6), 551–558 (1990)CrossRef
go back to reference Cronbach, L.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951) Cronbach, L.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)
go back to reference Fleiss, J.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)CrossRef Fleiss, J.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)CrossRef
go back to reference Gwet, K.: Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Stat. Methods Inter-Rater Reliab. Assess. Ser. 2, 1–9 (2002) Gwet, K.: Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Stat. Methods Inter-Rater Reliab. Assess. Ser. 2, 1–9 (2002)
go back to reference Gwet, K.: Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008)CrossRef Gwet, K.: Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008)CrossRef
go back to reference Gwet, K.: Handbook of Inter-Rater Reliability—A Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters. Advanced Analytics LLC, Gaithersburg (2010) Gwet, K.: Handbook of Inter-Rater Reliability—A Definitive Guide to Measuring the Extent of Agreement Among Multiple Raters. Advanced Analytics LLC, Gaithersburg (2010)
go back to reference Holsti, O.: Content analysis for the social sciences and humanities. Addison-Wesley: Reading, MA (1969) Holsti, O.: Content analysis for the social sciences and humanities. Addison-Wesley: Reading, MA (1969)
go back to reference Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd ed. Sage, Thousand Oaks (2004a) Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd ed. Sage, Thousand Oaks (2004a)
go back to reference Krippendorff, K.: A dissenting view on so-called paradoxes of reliability coefficients. In: Salmon, C.T. (ed.) Communication Yearbook, vol. 36, pp. 481–500. Routledge, New York (2012) Krippendorff, K.: A dissenting view on so-called paradoxes of reliability coefficients. In: Salmon, C.T. (ed.) Communication Yearbook, vol. 36, pp. 481–500. Routledge, New York (2012)
go back to reference Light, R.J.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol. Bull. 76(5), 365–377 (1971)CrossRef Light, R.J.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol. Bull. 76(5), 365–377 (1971)CrossRef
go back to reference Lin, L.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45(1), 255 (1989)CrossRef Lin, L.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45(1), 255 (1989)CrossRef
go back to reference Lombard, M., Snyder Duch, J.: Content analysis in mass communication: assessment and reporting of intercoder reliability. Hum. Commun. Res. 28(4), 587–604 (2002)CrossRef Lombard, M., Snyder Duch, J.: Content analysis in mass communication: assessment and reporting of intercoder reliability. Hum. Commun. Res. 28(4), 587–604 (2002)CrossRef
go back to reference Osgood, C.: The representational model and relevant research methods. In: de Sola Pool, I. (ed.) Trends in Content Analysis, pp. 33–88. University of Illinois Press, Champaign (1959) Osgood, C.: The representational model and relevant research methods. In: de Sola Pool, I. (ed.) Trends in Content Analysis, pp. 33–88. University of Illinois Press, Champaign (1959)
go back to reference Riffe, D., Lacy, S., Fico, F.: Analyzing Media Messages: Using Quantitative Content Analysis in Research. Lawrence Erlbaum Assoc Inc, New Jersey (2005) Riffe, D., Lacy, S., Fico, F.: Analyzing Media Messages: Using Quantitative Content Analysis in Research. Lawrence Erlbaum Assoc Inc, New Jersey (2005)
go back to reference Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955). doi:10.1086/266577 Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955). doi:10.​1086/​266577
go back to reference Spiegelman, M., Terwilliger, C., Fearing, F.: The reliability of agreement in content analysis. J. Soc. Psychol. 37, 175–187 (1953) Spiegelman, M., Terwilliger, C., Fearing, F.: The reliability of agreement in content analysis. J. Soc. Psychol. 37, 175–187 (1953)
go back to reference Zhao, X.: A Reliability Index (ai) that Assumes Honest Coders and Variable Randomness. Association for Education in Journalism and Mass Communication, Chicago (2012) Zhao, X.: A Reliability Index (ai) that Assumes Honest Coders and Variable Randomness. Association for Education in Journalism and Mass Communication, Chicago (2012)
go back to reference Zhao, X., Liu, J.S., Deng, K.: Assumptions behind inter-coder reliability indices. In: Salmon, C.T. (ed.) Communication Yearbook, vol. 36, pp. 419–480. Routledge, New York (2012) Zhao, X., Liu, J.S., Deng, K.: Assumptions behind inter-coder reliability indices. In: Salmon, C.T. (ed.) Communication Yearbook, vol. 36, pp. 419–480. Routledge, New York (2012)
Metadata
Title
Intercoder reliability indices: disuse, misuse, and abuse
Author
Guangchao Charles Feng
Publication date
01-05-2014
Publisher
Springer Netherlands
Published in
Quality & Quantity / Issue 3/2014
Print ISSN: 0033-5177
Electronic ISSN: 1573-7845
DOI
https://doi.org/10.1007/s11135-013-9956-8

Other articles of this Issue 3/2014

Quality & Quantity 3/2014 Go to the issue

Premium Partner