Skip to main content
Log in

General estimators for the reliability of qualitative data

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

We study a proportional reduction in loss (PRL) measure for the reliability of categorical data and consider the general case in which each ofN judges assigns a subject to one ofK categories. This measure has been shown to be equivalent to a measure proposed by Perreault and Leigh for a special case when there are two equally competent judges, and the correct category has a uniform prior distribution. We consider a general framework where the correct category is assumed to have an arbitrary prior distribution, and where classification probabilities vary by correct category, judge, and category of classification. In this setting, we consider PRL reliability measures based on two estimators of the correct category—the empirical Bayes estimator and an estimator based on the judges' consensus choice. We also discuss four important special cases of the general model and study several types of lower bounds for PRL reliability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.),2nd International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.

    Google Scholar 

  • Agresti, A. (1990).Categorical data analysis. New York: John Wiley & Sons.

    Google Scholar 

  • Batchelder, W. H., & Romney, A. K. (1986). The statistical analysis of a general Condorcet model for dichotomous choice situations. In B. Grofman & G. Owen (Eds.),Information pooling and group decision making (pp. 103–112). Greenwich, CN: JAI Press.

    Google Scholar 

  • Batchelder, W. H., & Romney, A. K. (1988). Test theory without an answer key.Psychometrika, 53, 193–224.

    Article  Google Scholar 

  • Batchelder, W. H., & Romney, A. K. (1989). New results in test theory without an answer key. In Edward E. Roskam (Ed.),Mathematical psychology in progress (pp. 229–248). Berlin, Heidelberg, New York: Springer-Verlag.

    Google Scholar 

  • Clogg, C. C. (1981). New developments in latent structure analysis. In D. M. Jackson & E. F. Borgatta (Eds.),Factor analysis and measurement in sociological research (pp. 215–246). London: Sage.

    Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20, 37–46.

    Google Scholar 

  • Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit.Psychological Bulletin, 70, 213–220.

    Google Scholar 

  • Cooil, B., & Rust, R. T. (1994). Reliability and expected loss: A unifying principle.Psychometrika, 59, 203–216.

    Article  Google Scholar 

  • Costner, H. L. (1965). Criteria for measures of association.American Sociological Review, 30, 341–353.

    Google Scholar 

  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.Psychometrika, 16, 297–334.

    Article  Google Scholar 

  • Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972).The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: John Wiley & Sons.

    Google Scholar 

  • David, F. N., & Barton, D. E. (1962).Combinatorial chance. London: Griffin.

    Google Scholar 

  • David, H. A. (1981).Order statistics (2nd ed.). New York: John Wiley & Sons.

    Google Scholar 

  • Dillon, W. R., & Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability.Multivariate Behavioral Research, 19, 438–458.

    Article  Google Scholar 

  • Haberman, S. J. (1974). Log-linear models for frequency tables derived by indirect observation. Maximum likelihood equations.Annals of Statistics, 2, 911–924.

    Google Scholar 

  • Hughes, M. A., & Garrett, D. E. (1990). Intercoder reliability estimation approaches in marketing: A generalizability theory framework for quantitative data.Journal of Marketing Research, 27, 185–195.

    Google Scholar 

  • Johnson, N. L., & Kotz, S. (1969).Discrete distributions. Boston, MA: Houghton Mifflin.

    Google Scholar 

  • Kesten, H., & Morse, N. (1959). A property of the multinomial distribution.Annals of Mathematical Statistics, 30, 120–127.

    Google Scholar 

  • Kozelka, R. M. (1956). Approximate upper percentage points for extreme values in multinomial sampling.Annals of Mathematical Statistics, 27, 507–512.

    Google Scholar 

  • Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of “scale analysis” and factor analysis.Psychological Bulletin, 45, 507–530.

    Google Scholar 

  • Marshall, A. W., & Olkin, I. (1979).Inequalities: Theory of majorization and its applications. New York: Academic Press.

    Google Scholar 

  • Mellenbergh, G. J., & van der Linden, W. J. (1979). The internal and external optimality of decisions based on tests.Applied Psychological Measurement, 3, 257–273.

    Google Scholar 

  • Perreault, W. D. Jr., & Leigh, L. E. (1989). Reliability of nominal data based on qualitative judgments.Journal of Marketing Research, 26, 135–48.

    Google Scholar 

  • Romney, A. K., Weller, S. C., & Batchelder, W. H. (1986). Culture as consensus: A theory of culture and informant accuracy.American Anthropologist, 88, 313–338.

    Article  Google Scholar 

  • Rust, R. T., Simester, D., Brodie, R. J., & Nilikant, V. (in press). Model selection criteria: An investigation of relative accuracy, posterior probabilities, and combinations of criteria.Management Science.

  • Schouten, H. J. A. (1982). Measuring pairwise agreement among many observers, II: Some improvements and additions.Biometrical Journal, 24, 431–435.

    Google Scholar 

  • Schouten, H. J. A. (1986). Nominal scale agreement among observers,Psychometrika, 51, 453–466.

    Article  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics, 6, 461–464.

    Google Scholar 

  • White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50, 1–25.

    Google Scholar 

  • Winer, B. J. (1971).Statistical principles in experimental design. New York: McGraw-Hill.

    Google Scholar 

  • Woodroofe, M. (1982).On model selection and the arc sine laws.Annals of Statistics, 10, 1182–1194.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruce Cooil.

Additional information

Bruce Cooil is Associate Professor of Statistics, and Roland T. Rust is Professor and area head for Marketing, Owen Graduate School of Management, Vanderbilt University. The authors thank three anonymous reviewers and an Associate Editor for their helpful comments and suggestions. This work was supported in part by the Dean's Fund for Faculty Research of the Owen Graduate School of Management, Vanderbilt University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cooil, B., Rust, R.T. General estimators for the reliability of qualitative data. Psychometrika 60, 199–220 (1995). https://doi.org/10.1007/BF02301413

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02301413

Key words

Navigation