Skip to main content
Log in

Cluster Analysis for Cognitive Diagnosis: Theory and Applications

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Blashfield, P.K. (1976). Mixture model tests of cluster analysis: accuracy of four agglomerative hierachical methods. Psychological Bulletin, 83, 377–385.

    Article  Google Scholar 

  • Bradley, P.S., & Fayyad, U.M. (1998). Refining initial points for K-means clustering. In J. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (pp. 91–99). Burlington: Morgan Kaufmann.

    Google Scholar 

  • Bartholomew, D.J. (1987). Latent variable models and factor analysis. New York: Oxford University Press.

    Google Scholar 

  • Cunnningham, K.M., & Ogilvie, J.C. (1972). Evaluation of hierachical grouping techniques: A preliminary study. Computer Journal, 15, 209–213.

    Article  Google Scholar 

  • de la Torre, J., & Douglas, J.A. (2004). Higher order latent trait models for cognitive diagnosis. Psychometrika, 69, 333–353.

    Article  Google Scholar 

  • Embretson, S. (1997). Multicomponent response models. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–321). New York: Springer.

    Google Scholar 

  • Everitt, B.S., Landau, S., & Leese, M. (2001). Cluster analysis (4th ed.). London: Arnold.

    Google Scholar 

  • Forgy, E.W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768–769.

    Google Scholar 

  • Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria. The Annals of Statistics, 6, 117–131.

    Article  Google Scholar 

  • Haertel, E.H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333–352.

    Article  Google Scholar 

  • Hands, S., & Everitt, B.S. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techiniques. Multivariate Behavioural Research, 22, 235–243.

    Article  Google Scholar 

  • Hartigan, J.A. (1975). Clustering algorithms. New York: Wiley.

    Google Scholar 

  • Hartz, S., Roussos, L., Henson, R., & Templin, J. (2005). The Fusion Model for skill diagnosis: Blending theory with practicality. Unpublished manuscript.

  • Henson, R., & Templin, J. (2007). Paper presented at the Annual Meeting of the National Council on Measurement in Education, Chicago, IL.

  • Hoeffding, W. (1963). Probabilistic inequalities for sums of bounded random variables. Annals of Mathematical Statistics, 58, 13–30.

    Google Scholar 

  • Hubert, L.J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • Junker, B.W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272.

    Article  Google Scholar 

  • Kaufman, J., & Rousseuw, P. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

    Google Scholar 

  • Kuiper, F.K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures. Biometrics, 31, 777–783.

    Article  Google Scholar 

  • Lattin, J., Carroll, J.D., & Green, P.E. (2003). Analyzing multivariate data. Pacific Grove: Brooks/Cole, Thomson Learning.

    Google Scholar 

  • Liu, Y., Douglas, J., & Henson, R. (2007). Testing person fit in cognitive diagnosis. Unpublished manuscript.

  • MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Bekeley Symposium on Mathematical Statistics and Probability (pp. 281–207). Berkeley: University of California Press.

    Google Scholar 

  • Macready, G.B., & Dayton, C.M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 33, 379–416.

    Google Scholar 

  • Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187–212.

    Article  Google Scholar 

  • Milligan, G.W. (1980). An examination of the effects of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325–342.

    Article  Google Scholar 

  • Muthén, L.K., & Muthén, B.O. (2006). Mplus user’s guide (4th ed.). Los Angeles: Muthén & Muthén.

    Google Scholar 

  • Pena, J., Lozano, J., & Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognition Letters, 20, 1027–1040.

    Article  Google Scholar 

  • Pollard, D. (1981). Strong consistency of K-means clustering. The Annals of Statistics, 9(1), 135–140.

    Article  Google Scholar 

  • Pollard, D. (1982). Quantization and the method of K-means. IEEE Transactions on Information Theory, 28, 199–205.

    Article  Google Scholar 

  • Punj, G., & Stewart, D.W. (1983). Cluster analysis in marketing research: A review and suggestions for application. Journal of Marketing Research, 20, 134–148.

    Article  Google Scholar 

  • Rupp, A.A., & Templin, J.L. (2007). Unique characteristics of cognitive diagnosis models. The Annual Meeting of the National Council for Measurement in Education, Chicago, April 2007.

  • Steinley, D. (2003). Local optima in k-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304.

    Article  PubMed  Google Scholar 

  • Steinley, D. (2006). K-mean clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.

    Article  PubMed  Google Scholar 

  • Tatsuoka, C. (2002). Data-analytic methods for latent partially ordered classification models. Applied Statistics (JRSS-C), 51, 337–350.

    Google Scholar 

  • Tatsuoka, K. (1985). A probabilistic model for diagnosing misconceptions in the pattern classification approach. Journal of Educational Statistics, 12, 55–73.

    Article  Google Scholar 

  • Templin, J.L., & Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305.

    Article  PubMed  Google Scholar 

  • Templin, J., Henson, R., & Douglas, J. (2007). General theory and estimation of cognitive diagnosis models: Using Mplus to rerive model estimates. Unpublished manuscript.

  • von Davier, M. (2005). A general diagnostic model applied to language testing data. Educational Testing Service, Research Report, RR-05-16.

  • Ward, J.H. (1963). Hierarchical Grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.

    Article  Google Scholar 

  • Willse, J.T., Henson, R.A., & Templin, J.L. (2007). Using sumscores or IRT in place of cognitive diagnostic models: Can more familiar models do the job? Presented at the annual meeting of the National Council on Measurement in Education, Chicago, Illinois.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey A. Douglas.

Additional information

We would like to thank the English Language Institute at the University of Michigan for data and the National Science Foundation for funding (grant number 0648882).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiu, CY., Douglas, J.A. & Li, X. Cluster Analysis for Cognitive Diagnosis: Theory and Applications. Psychometrika 74, 633–665 (2009). https://doi.org/10.1007/s11336-009-9125-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-009-9125-0

Keywords

Navigation