Abstract
We study a proportional reduction in loss (PRL) measure for the reliability of categorical data and consider the general case in which each ofN judges assigns a subject to one ofK categories. This measure has been shown to be equivalent to a measure proposed by Perreault and Leigh for a special case when there are two equally competent judges, and the correct category has a uniform prior distribution. We consider a general framework where the correct category is assumed to have an arbitrary prior distribution, and where classification probabilities vary by correct category, judge, and category of classification. In this setting, we consider PRL reliability measures based on two estimators of the correct category—the empirical Bayes estimator and an estimator based on the judges' consensus choice. We also discuss four important special cases of the general model and study several types of lower bounds for PRL reliability.
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.),2nd International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.
Agresti, A. (1990).Categorical data analysis. New York: John Wiley & Sons.
Batchelder, W. H., & Romney, A. K. (1986). The statistical analysis of a general Condorcet model for dichotomous choice situations. In B. Grofman & G. Owen (Eds.),Information pooling and group decision making (pp. 103–112). Greenwich, CN: JAI Press.
Batchelder, W. H., & Romney, A. K. (1988). Test theory without an answer key.Psychometrika, 53, 193–224.
Batchelder, W. H., & Romney, A. K. (1989). New results in test theory without an answer key. In Edward E. Roskam (Ed.),Mathematical psychology in progress (pp. 229–248). Berlin, Heidelberg, New York: Springer-Verlag.
Clogg, C. C. (1981). New developments in latent structure analysis. In D. M. Jackson & E. F. Borgatta (Eds.),Factor analysis and measurement in sociological research (pp. 215–246). London: Sage.
Cohen, J. (1960). A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20, 37–46.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit.Psychological Bulletin, 70, 213–220.
Cooil, B., & Rust, R. T. (1994). Reliability and expected loss: A unifying principle.Psychometrika, 59, 203–216.
Costner, H. L. (1965). Criteria for measures of association.American Sociological Review, 30, 341–353.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.Psychometrika, 16, 297–334.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972).The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: John Wiley & Sons.
David, F. N., & Barton, D. E. (1962).Combinatorial chance. London: Griffin.
David, H. A. (1981).Order statistics (2nd ed.). New York: John Wiley & Sons.
Dillon, W. R., & Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability.Multivariate Behavioral Research, 19, 438–458.
Haberman, S. J. (1974). Log-linear models for frequency tables derived by indirect observation. Maximum likelihood equations.Annals of Statistics, 2, 911–924.
Hughes, M. A., & Garrett, D. E. (1990). Intercoder reliability estimation approaches in marketing: A generalizability theory framework for quantitative data.Journal of Marketing Research, 27, 185–195.
Johnson, N. L., & Kotz, S. (1969).Discrete distributions. Boston, MA: Houghton Mifflin.
Kesten, H., & Morse, N. (1959). A property of the multinomial distribution.Annals of Mathematical Statistics, 30, 120–127.
Kozelka, R. M. (1956). Approximate upper percentage points for extreme values in multinomial sampling.Annals of Mathematical Statistics, 27, 507–512.
Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of “scale analysis” and factor analysis.Psychological Bulletin, 45, 507–530.
Marshall, A. W., & Olkin, I. (1979).Inequalities: Theory of majorization and its applications. New York: Academic Press.
Mellenbergh, G. J., & van der Linden, W. J. (1979). The internal and external optimality of decisions based on tests.Applied Psychological Measurement, 3, 257–273.
Perreault, W. D. Jr., & Leigh, L. E. (1989). Reliability of nominal data based on qualitative judgments.Journal of Marketing Research, 26, 135–48.
Romney, A. K., Weller, S. C., & Batchelder, W. H. (1986). Culture as consensus: A theory of culture and informant accuracy.American Anthropologist, 88, 313–338.
Rust, R. T., Simester, D., Brodie, R. J., & Nilikant, V. (in press). Model selection criteria: An investigation of relative accuracy, posterior probabilities, and combinations of criteria.Management Science.
Schouten, H. J. A. (1982). Measuring pairwise agreement among many observers, II: Some improvements and additions.Biometrical Journal, 24, 431–435.
Schouten, H. J. A. (1986). Nominal scale agreement among observers,Psychometrika, 51, 453–466.
Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics, 6, 461–464.
White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50, 1–25.
Winer, B. J. (1971).Statistical principles in experimental design. New York: McGraw-Hill.
Woodroofe, M. (1982).On model selection and the arc sine laws.Annals of Statistics, 10, 1182–1194.
Author information
Authors and Affiliations
Corresponding author
Additional information
Bruce Cooil is Associate Professor of Statistics, and Roland T. Rust is Professor and area head for Marketing, Owen Graduate School of Management, Vanderbilt University. The authors thank three anonymous reviewers and an Associate Editor for their helpful comments and suggestions. This work was supported in part by the Dean's Fund for Faculty Research of the Owen Graduate School of Management, Vanderbilt University.
Rights and permissions
About this article
Cite this article
Cooil, B., Rust, R.T. General estimators for the reliability of qualitative data. Psychometrika 60, 199–220 (1995). https://doi.org/10.1007/BF02301413
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02301413