Skip to main content
Erschienen in: Advances in Data Analysis and Classification 3/2013

01.09.2013 | Regular Article

Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas

verfasst von: Nema Dean, Rebecca Nugent

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo in Comput Stat 28(4):10.​1007/​s00180-012-367-4, 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al. in J User Model User Adap Inter 19(3):243–266, 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ayers E, Nugent R, Dean N (2008) Skill set profile clustering based on student capability vectors computed from online tutoring data. In: Baker R, Barnes T, Beck JE (eds) Proceedings of the 1st international conference on educational data mining. Montreal, Canada, pp 210–217 Ayers E, Nugent R, Dean N (2008) Skill set profile clustering based on student capability vectors computed from online tutoring data. In: Baker R, Barnes T, Beck JE (eds) Proceedings of the 1st international conference on educational data mining. Montreal, Canada, pp 210–217
Zurück zum Zitat Ayers E, Nugent R, Dean N (2009) A comparison of student skill knowledge estimates. In: Barnes T, Desmarais M, Romero C, Ventura S (eds) Proceedings of the 2nd international conference on educational data mining. Cordoba, Spain, pp 1–10 Ayers E, Nugent R, Dean N (2009) A comparison of student skill knowledge estimates. In: Barnes T, Desmarais M, Romero C, Ventura S (eds) Proceedings of the 2nd international conference on educational data mining. Cordoba, Spain, pp 1–10
Zurück zum Zitat Bagnato L, Punzo A (2013) Finite mixtures of unimodal beta and gamma densities and the \(k\)-bumps algorithm. Computational Statistics 28(4): doi:10.1007/s00180-012-367-4 Bagnato L, Punzo A (2013) Finite mixtures of unimodal beta and gamma densities and the \(k\)-bumps algorithm. Computational Statistics 28(4): doi:10.​1007/​s00180-012-367-4
Zurück zum Zitat Barnes TM (2005) The Q-matrix method: mining student response data for knowledge. In: Beck JE (ed) Educational data mining: papers from the 2005 AAAI workshop. American Association for Artificial Intelligence, Menlo Park, California, Technical, Report WS-05-02, pp 39–46 Barnes TM (2005) The Q-matrix method: mining student response data for knowledge. In: Beck JE (ed) Educational data mining: papers from the 2005 AAAI workshop. American Association for Artificial Intelligence, Menlo Park, California, Technical, Report WS-05-02, pp 39–46
Zurück zum Zitat Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332–353MathSciNetCrossRef Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332–353MathSciNetCrossRef
Zurück zum Zitat Dean N, Nugent R (2011) Comparing different clustering models on the unit hypercube. In: Proceedings of the 58th world statistics congress. International Statistical Institute, Dublin Dean N, Nugent R (2011) Comparing different clustering models on the unit hypercube. In: Proceedings of the 58th world statistics congress. International Statistical Institute, Dublin
Zurück zum Zitat Dean N, Nugent R (2013) Mixture model component trees: Visualizing the hierarchical structure of complex groups. Tech. rep., University of Glasgow (in preparation) Dean N, Nugent R (2013) Mixture model component trees: Visualizing the hierarchical structure of complex groups. Tech. rep., University of Glasgow (in preparation)
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B Methodol 39(1):1–38 with discussionMathSciNetMATH Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B Methodol 39(1):1–38 with discussionMathSciNetMATH
Zurück zum Zitat DiBello L, Roussos L, Stout W (2007) Review of cognitively diagnostic assessment and a summary of psychometric models. In: Rao CR, Sinharay S (eds) Handbook of Statistics, 26. Elsevier, Amsterdam, pp 979–1030 DiBello L, Roussos L, Stout W (2007) Review of cognitively diagnostic assessment and a summary of psychometric models. In: Rao CR, Sinharay S (eds) Handbook of Statistics, 26. Elsevier, Amsterdam, pp 979–1030
Zurück zum Zitat Feng M, Heffernan N, Koedinger K (2009) Addressing the assessment challenge in an intelligent tutoring system that tutors as it assesses. J User Model User Adapt Inter 19(3):243–266CrossRef Feng M, Heffernan N, Koedinger K (2009) Addressing the assessment challenge in an intelligent tutoring system that tutors as it assesses. J User Model User Adapt Inter 19(3):243–266CrossRef
Zurück zum Zitat Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588MATHCrossRef Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588MATHCrossRef
Zurück zum Zitat Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–612MathSciNetMATHCrossRef Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–612MathSciNetMATHCrossRef
Zurück zum Zitat Fraley C, Raftey AE (2007) MCLUST version 3 for R: normal mixture modeling and model-based clustering. Tech. Rep. 504, Department of Statistics, University of Washington, Washington Fraley C, Raftey AE (2007) MCLUST version 3 for R: normal mixture modeling and model-based clustering. Tech. Rep. 504, Department of Statistics, University of Washington, Washington
Zurück zum Zitat Fraley C, Raftey AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Tech. Rep. 597, Department of Statistics, University of Washington, Washington Fraley C, Raftey AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Tech. Rep. 597, Department of Statistics, University of Washington, Washington
Zurück zum Zitat Hennig C (2010b) Ridgeline plot and clusterwise stability as tools for merging Gaussian mixture components. In: Locarek-Junge H, Weihs C (eds) Classification as a tool for research. Springer, Berlin, pp 109–116 Hennig C (2010b) Ridgeline plot and clusterwise stability as tools for merging Gaussian mixture components. In: Locarek-Junge H, Weihs C (eds) Classification as a tool for research. Springer, Berlin, pp 109–116
Zurück zum Zitat Henson J, Templin R, Douglas J (2007) Using efficient model based sum-scores for conducting skill diagnoses. J Edu Measur 44(4):361–376CrossRef Henson J, Templin R, Douglas J (2007) Using efficient model based sum-scores for conducting skill diagnoses. J Edu Measur 44(4):361–376CrossRef
Zurück zum Zitat Hubert L, Arabie P (1985) Comparing partitions. J Class 2(1):193–218CrossRef Hubert L, Arabie P (1985) Comparing partitions. J Class 2(1):193–218CrossRef
Zurück zum Zitat Ji Y, Wu C, Liu P, Wang J, Coombes KR (2005) Applications of beta-mixture models in bioinformatics. Bioinformatics 21(9):2118–2122CrossRef Ji Y, Wu C, Liu P, Wang J, Coombes KR (2005) Applications of beta-mixture models in bioinformatics. Bioinformatics 21(9):2118–2122CrossRef
Zurück zum Zitat Junker BW, Sijtsma K (2001) Cognitive assessment models with few assumptions and connections with nonparametric item response theory. Appl Psych Meas 25(3):258–272MathSciNetCrossRef Junker BW, Sijtsma K (2001) Cognitive assessment models with few assumptions and connections with nonparametric item response theory. Appl Psych Meas 25(3):258–272MathSciNetCrossRef
Zurück zum Zitat Lazarsfeld PF, Henry PW (1968) Latent structure analysis. Houghton Mifflin, BostonMATH Lazarsfeld PF, Henry PW (1968) Latent structure analysis. Houghton Mifflin, BostonMATH
Zurück zum Zitat Lindsay BG (1995) Mixture models: theory, geometry, and applications. In: NSF-CBMS regional conference series in probability and statistics, vol. 5, Institute of Mathematical Statistics, Hayward Lindsay BG (1995) Mixture models: theory, geometry, and applications. In: NSF-CBMS regional conference series in probability and statistics, vol. 5, Institute of Mathematical Statistics, Hayward
Zurück zum Zitat Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection in model-based clustering: a general variable role modeling. Comput Stat Data Anal 53(11):3872–3882MathSciNetMATHCrossRef Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection in model-based clustering: a general variable role modeling. Comput Stat Data Anal 53(11):3872–3882MathSciNetMATHCrossRef
Zurück zum Zitat McLachlan G, Peel D (1999) The EMMIX algorithm for the fitting of normal and t-components. J Stat Softw 4(2):1–14 McLachlan G, Peel D (1999) The EMMIX algorithm for the fitting of normal and t-components. J Stat Softw 4(2):1–14
Zurück zum Zitat Nugent R, Ayers E, Dean N (2009) Conditional subspace clustering of skill mastery: identifying skills that separate students. In: Barnes T, Desmarais M, Romero C, Ventura S (eds) Proceedings of the 2nd international conference on educational data mining. Cordoba, Spain, pp 101–110 Nugent R, Ayers E, Dean N (2009) Conditional subspace clustering of skill mastery: identifying skills that separate students. In: Barnes T, Desmarais M, Romero C, Ventura S (eds) Proceedings of the 2nd international conference on educational data mining. Cordoba, Spain, pp 101–110
Zurück zum Zitat Rupp AA, Templin J, Henson RA (2010) Diagnostic measurement: theory, methods, and applications. Guilford Press, New York Rupp AA, Templin J, Henson RA (2010) Diagnostic measurement: theory, methods, and applications. Guilford Press, New York
Zurück zum Zitat Sokal RR, Rohlf JF (1981) Biometry: the principles and practice of statistics in biological research, 2nd edn. W. H Freemand and Company, San Francisco Sokal RR, Rohlf JF (1981) Biometry: the principles and practice of statistics in biological research, 2nd edn. W. H Freemand and Company, San Francisco
Zurück zum Zitat Torre JDL (2009) DINA model and parameter estimation: a didactic. J Edu Behav Stat 34(1):115–130CrossRef Torre JDL (2009) DINA model and parameter estimation: a didactic. J Edu Behav Stat 34(1):115–130CrossRef
Metadaten
Titel
Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas
verfasst von
Nema Dean
Rebecca Nugent
Publikationsdatum
01.09.2013
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 3/2013
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-013-0149-z

Weitere Artikel der Ausgabe 3/2013

Advances in Data Analysis and Classification 3/2013 Zur Ausgabe

Premium Partner