Skip to main content

Educational Data Mining and Learning Analytics

  • Chapter
  • First Online:
Learning Analytics

Abstract

In recent years, two communities have grown around a joint interest on how big data can be exploited to benefit education and the science of learning: Educational Data Mining and Learning Analytics. This article discusses the relationship between these two communities, and the key methods and approaches of educational data mining. The article discusses how these methods emerged in the early days of research in this area, which methods have seen particular interest in the EDM and learning analytics communities, and how this has changed as the field matures and has moved to making significant contributions to both educational research and practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A fourth type of structure discovery, Network Analysis, is more characteristic of work in learning analytics than in educational data mining (cf. Dawson 2008; Suthers and Rosen 2011), and is not discussed in detail here for that reason.

References

  • Aleven, V., Mclaren, B., Roll, I., & Koedinger, K. (2006). Toward meta-cognitive tutoring: A model of help seeking with a cognitive tutor. International Journal of Artificial Intelligence in Education, 16(2), 101–128.

    Google Scholar 

  • Amershi, S., & Conati, C. (2009). Combining unsupervised and supervised classification to build user models for exploratory learning environments. Journal of Educational Data Mining, 1(1), 18–71.

    Google Scholar 

  • Arroyo, I., & Woolf, B. (2005). Inferring learning and attitudes from a Bayesian Network of log file data. In: Proceedings of the 12th International Conference on Artificial Intelligence in Education (pp. 33–40).

    Google Scholar 

  • Baker, R., Corbett. A. T., Koedinger, K., & Wagner, A. Z. (2004). Off-task behavior in the cognitive tutor classroom: When students game the system. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 383–390).

    Google Scholar 

  • Baker, R., de Carvalho, A., Raspat, J., Aleven, V., Corbett, A., & Koedinger, K. (2009). Educational software features that encourage and discourage “gaming the system”. In: Proceedings of the International Conference on Artificial Intelligence in Education (pp. 475–482).

    Google Scholar 

  • Baker, R., & Gowda, S. (2010). An analysis of the differences in the frequency of students’ disengagement in urban, rural, and suburban high schools. In: Proceedings of the 3rd International Conference on Educational Data Mining (pp. 11–20).

    Google Scholar 

  • Baker, R., Gowda, S. M., & Corbett, A. T. (2011a). Towards predicting future transfer of learning. In G. Biswas, S. Bull, J. Kay, & A. Mitrovic (Eds.), Artificial intelligence in education: Vol. 6738. Lecture notes in computer science (pp. 23–30). Heidelberg, Germany: Springer.

    Google Scholar 

  • Baker, R., Gowda, S. M., & Corbett, A. T. (2011b). Automatically detecting a student’s preparation for future learning: Help use is key. In Proceedings of the 4th International Conference on Educational Data Mining (pp. 179–188).

    Google Scholar 

  • Baker, R., Kalka, J., Aleven, V., Rossi, L., Gowda, S., Wagner, A., et al. (2012). Towards sensor-free affect detection in cognitive tutor algebra. In: Proceedings of the 5th International Conference on Educational Data Mining (pp. 126–133).

    Google Scholar 

  • Baker, R., Walonoski, J., Heffernan, N., Roll, I., Corbett, A., & Koedinger, K. (2008). Why students engage in “gaming the system” behavior in interactive learning environments. Journal of Interactive Learning Research, 19(2), 185–224.

    Google Scholar 

  • Baker, R., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.

    Google Scholar 

  • Bakharia, A., & Dawson, S. (2011). SNAPP: A bird’s-eye view of temporal participant interaction. In: Proceedings of the 1st International Conference on Learning Analytics and Knowledge (pp. 168–173).

    Google Scholar 

  • Barnes, T. (2005). The q-matrix method: Mining student response data for knowledge. In: Proceedings of the American Association for Artificial Intelligence 2005 Educational Data Mining Workshop (pp. 39–46).

    Google Scholar 

  • Barnes, T., Bitzer, D., & Vouk, M. (2005). Experimental analysis of the q-matrix method in knowledge discovery. In M.-S. Hacid, N. Murray, Z. Raś, & S. Tsumoto (Eds.), Foundations of intelligent systems: Vol. 3488. Lecture notes in computer science (pp. 603–611). Heidelberg, Germany: Springer.

    Google Scholar 

  • Beal, C. R., Qu, L., & Lee, H. (2006). Classifying learner engagement through integration of multiple data sources. In: Proceedings of the 21st National Conference on Artificial Intelligence (pp. 151–156).

    Google Scholar 

  • Beheshti, B., & Desmarais, M. (2012). Improving matrix factorization techniques of student test data with partial order constraints. In: Proceedings of the 20th International Conference on User Modeling, Adaptation, and Personalization (pp. 346–350).

    Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57(1), 289–300.

    Google Scholar 

  • Ben-Naim, D., Bain, M., & Marcus, N. (2009). A user-driven and data-driven approach for supporting teachers in reflection and adaptation of adaptive tutorials. In: Proceedings of the 2nd International Conference on Educational Data Mining (pp. 21–30).

    Google Scholar 

  • Bouchet, F., Azevedo, R., Kinnebrew, J., & Biswas, G. (2012). Identifying students’ characteristic learning behaviors in an intelligent tutoring system fostering self-regulated learning. In: Proceedings of the 5th International Conference on Educational Data Mining (pp. 65–72).

    Google Scholar 

  • Brin, S., Motwani, R., Ullman, J., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM International Conference on Management of Data (pp. 255–264).

    Google Scholar 

  • Cen, H., Koedinger, K., & Junker, B. (2006). Learning factors analysis—A general method for cognitive model evaluation and improvement. In M. Ikeda, K. Ashley, & T.-W. Chan (Eds.), Intelligent tutoring systems: Vol. 4053. Lecture notes in computer science (pp. 164–175). Heidelberg, Germany: Springer.

    Google Scholar 

  • Cen, H., Koedinger, K., & Junker, B. (2007). Is over practice necessary?—Improving learning efficiency with the cognitive tutor through educational data mining. In: Proceedings of 13th International Conference on Artificial Intelligence in Education (pp. 511–518).

    Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  • Corbett, A., & Anderson, J. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4(4), 253–278.

    Article  Google Scholar 

  • d’Aquin, M., & Jay, N. (2013). Interpreting data mining results with linked data for learning analytics: Motivation, case study and directions. In: Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (pp. 155–164).

    Google Scholar 

  • D’Mello, S., Craig, S., Witherspoon, A., Mcdaniel, B., & Graesser, A. (2008). Automatic detection of learner’s affect from conversational cues. User Modeling and User-Adapted Interaction, 18(1–2), 45–80.

    Article  Google Scholar 

  • D’Mello, S., Olney, A., & Person, N. (2010). Mining collaborative patterns in tutorial dialogues. Journal of Educational Data Mining, 2(1), 1–37.

    Google Scholar 

  • Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning (pp. 233–240).

    Google Scholar 

  • Dawson, S. (2008). A study of the relationship between student social networks and sense of community. Educational Technology and Society, 11(3), 224–238.

    Google Scholar 

  • Dekker, G., Pechenizkiy, M., & Vleeshouwers, J. (2009). Predicting students drop out: A case study. In: Proceedings of 2nd International Conference on Educational Data Mining (pp. 41–50).

    Google Scholar 

  • Desmarais, M. (2011). Conditions for effectively deriving a q-matrix from data with non-negative matrix factorization. In: Proceedings of the 4th International Conference on Educational Data Mining (pp. 41–50).

    Google Scholar 

  • Desmarais, M., Beheshti, B., & Naceur, R. (2012). Item to skills mapping: Deriving a conjunctive q-matrix from data. In S. A. Cerri, W. J. Clancey, G. Papadourakis, & K.-K. Panourgia (Eds.), Intelligent tutoring systems: Vol. 7315. Lecture notes in computer science (pp. 454–463). Heidelberg, Germany: Springer.

    Google Scholar 

  • Fancsali, S. (2012). Variable construction and causal discovery for cognitive tutor log data: Initial results. In: Proceedings of the 5th Conference on Educational Data Mining (pp. 238–239).

    Google Scholar 

  • Feng, M., & Heffernan, N. (2007). Towards live informing and automatic analyzing of student learning: Reporting in the assistment system. Journal of Interactive Learning Research, 18(2), 207–230.

    Google Scholar 

  • Feng, M., Heffernan, N., & Koedinger, K. (2009). Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction, 19(3), 243–266.

    Article  Google Scholar 

  • Goldin, I., Koedinger, K. R., & Aleven, V. (2012). Learner differences in hint processing. In: Proceedings of the 5th International Conference on Educational Data Mining (pp. 73–80).

    Google Scholar 

  • Gong, Y., Beck, J. E., & Heffernan, N. T. (2011). How to construct more accurate student models: Comparing and optimizing knowledge tracing and performance factor analysis. International Journal of Artificial Intelligence in Education, 21(1), 27–46.

    Google Scholar 

  • Hanley, A., & McNeil, B. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.

    Google Scholar 

  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

    Book  Google Scholar 

  • Kay, J., Maisonneuve, N., Yacef, K., & Zaïane, O. (2006). Mining patterns of events in students’ teamwork data. In: Proceedings of the Workshop on Educational Data Mining at the 8th International Conference on Intelligent Tutoring Systems (pp. 45–52).

    Google Scholar 

  • Kinnebrew, J., & Biswas, G. (2012). Identifying learning behaviors by contextualizing differential sequence mining with action features and performance evolution. In: Proceedings of the 5th International Conference on Educational Data Mining (pp. 57–64).

    Google Scholar 

  • Kline, P. (1993). An easy guide to factor analysis. London: Routledge.

    Google Scholar 

  • Koedinger, K., McLaughlin, E., & Stamper, J. (2012). Automated student model improvement. In: Proceedings of the 5th International Conference on Educational Data Mining (pp. 17–24).

    Google Scholar 

  • Lin, J., Keogh, E., Lonardi, S., & Patel, P. (2002). Finding motifs in time series. In: Proceedings of the 2nd Workshop on Temporal Data Mining (pp. 53–68).

    Google Scholar 

  • Martin, J., & VanLehn, K. (1995). Student assessment using Bayesian nets. International Journal of Human Computer Studies, 42(6), 575–592.

    Article  Google Scholar 

  • Martinez, R., Yacef, K., Kay, J., Kharrufa, A., & Al-Qaraghuli, A. (2011). Analysing frequent sequential patterns of collaborative learning activity around an interactive tabletop. In: Proceedings of the 4th International Conference on Educational Data Mining (pp. 111–120).

    Google Scholar 

  • Merceron, A., & Yacef, K. (2005). Educational data mining: A case study. In: Proceedings of the 2005 Conference on Artificial Intelligence in Education: Supporting Learning Through Socially Informed Technology (pp. 467–474).

    Google Scholar 

  • Merceron, A., & Yacef, K. (2008). Interestingness measures for association rules in educational data. In: Proceedings of the 1st International Conference on Educational Data Mining (pp. 57–66).

    Google Scholar 

  • Minaei-Bidgoli, B., Kashy, D., Kortmeyer, G., & Punch, W. (2003). Predicting student performance: An application of data mining methods with an educational web-based system. In: Frontiers in Education, 2003. FIE 2003 33rd Annual (pp. T2A 13–18). (http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1263284&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F8925%2F28250%2F01263284.pdf%3Farnumber%3D1263284#).

    Google Scholar 

  • Pardos, Z., Baker, R., San Pedro, M., Gowda, S., & Gowda, S. (2013). Affective states and state tests: Investigating how affect throughout the school year predicts end of year learning outcomes. In: Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (pp. 117–124).

    Google Scholar 

  • Pardos, Z. A., Gowda, S. M., Baker, R., & Heffernan, N. T. (2012). The sum is greater than the parts: Ensembling models of student knowledge in educational software. ACM SIGKDD Explorations Newsletter, 13(2), 37–44.

    Article  Google Scholar 

  • Pavlik, P., Cen, H., & Koedinger, K. R. (2009) Performance factors analysis—A new alternative to knowledge tracing. In: Proceedings of the 14th International Conference on Artificial Intelligence in Education (pp. 531–538).

    Google Scholar 

  • Perera, D., Kay, J., Koprinska, I., Yacef, K., & Zaïane, O. R. (2009). Clustering and sequential pattern mining of online collaborative learning data. IEEE Transactions on Knowledge and Data Engineering, 21(6), 759–772.

    Article  Google Scholar 

  • Rai, D., & Beck, J. (2011). Exploring user data from a game-like math tutor: A case study in causal modeling. In: Proceedings of the 4th International Conference on Educational Data Mining (pp. 307–313).

    Google Scholar 

  • Rau, A., & Scheines, R. (2012). Searching for variables and models to investigate mediators of learning from multiple representations. In: Proceedings of the 5th International Conference on Educational Data Mining (pp. 110–117).

    Google Scholar 

  • Roll, I., Aleven, V., McLaren, B. M., & Koedinger, K. R. (2007). Can help seeking be tutored? Searching for the secret sauce of metacognitive tutoring. In: Proceedings of the 13th International Conference on Artificial Intelligence in Education, Marina del Rey, CA (pp. 203–210).

    Google Scholar 

  • Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.

    Article  Google Scholar 

  • Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, 40(6), 601–618.

    Article  Google Scholar 

  • Rus, V., Moldovan, C., Graesser, A., & Niraula, N. (2012). Automated discovery of speech act categories in educational games. In: Proceedings of the 5th International Conference on Educational Data Mining (pp. 25–32).

    Google Scholar 

  • San Pedro, M., Baker, R., Bowers, A., & Heffernan, N. (2013). Predicting college enrollment from student interaction with an intelligent tutoring system in middle school. In Proceedings of the 6th International Conference on Educational Data Mining (pp. 177–184).

    Google Scholar 

  • Sao Pedro, M., Baker, R., Montalvo, O., Nakama, A., & Gobert, J. D. (2010). Using text replay tagging to produce detectors of systematic experimentation behavior patterns. In: Proceedings of the 3rd International Conference on Educational Data Mining (pp. 181–190).

    Google Scholar 

  • Scheines, R., Spirtes, P., Glymour, C., Meek, C., & Richardson, T. (1998). The TETRAD project: Constraint based aids to causal model specification. Multivariate Behavioral Research, 33(1), 65–117.

    Article  Google Scholar 

  • Scheuer, O., & McLaren, B. M. (2011). Educational data mining. The encyclopedia of the sciences of learning. New York: Springer.

    Google Scholar 

  • Schreurs, B., Teplovs, C., Ferguson, R., De Laat, M., & Buckingham Shum, S. (2013). Visualizing social learning ties by type and topic: Rationale and concept demonstrator. In: Proceedings of the 3rd International Conference on Learning Analytics and Knowledge (pp. 33–37).

    Google Scholar 

  • Shanabrook, D. H., Cooper, D. G., Woolf, B. P., & Arroyo, I. (2010). Identifying high-level student behavior using sequence-based motif discovery. In: Proceedings of the 3rd International Conference on Educational Data Mining (pp. 191–200).

    Google Scholar 

  • Shute, V. J. (1995). SMART: Student modeling approach for responsive tutoring. User Modeling and User-Adapted Interaction, 5(1), 1–44.

    Article  Google Scholar 

  • Siemens, G., & Baker, R. (2012). Learning analytics and educational data mining: Towards communication and collaboration. In: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (pp. 252–254).

    Google Scholar 

  • Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. New York: MIT Press.

    Google Scholar 

  • Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. Heidelberg, Germany: Springer.

    Google Scholar 

  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics, 31(6), 2013–2035.

    Article  Google Scholar 

  • Suthers, D., & Rosen, D. (2011). A unified framework for multi-level analysis of distributed learning. In: Proceedings of the 1st International Conference on Learning Analytics and Knowledge (pp. 64–74).

    Google Scholar 

  • Tatsuoka, K. (1995). Architecture of knowledge structures and cognitive diagnosis: A statistical pattern recognition and classification approach. In P. Nichols, S. Chipman, & R. Brennan (Eds.), Cognitively diagnostic assessment (pp. 327–359). London: Routledge.

    Google Scholar 

  • Thai-Nghe, N., Horvath, T., & Schmidt-Thieme, L. (2011). Context-Aware factorization for personalized student’s task recommendation. In: Proceedings of the International Workshop on Personalization Approaches in Learning Environments (pp. 13–18).

    Google Scholar 

  • Vuong, A., Nixon, T., & Towle, B. (2011). A method for finding prerequisites within a curriculum. In: Proceedings of the 4th International Conference on Educational Data Mining (pp. 211–216).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan Shaun Baker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Baker, R.S., Inventado, P.S. (2014). Educational Data Mining and Learning Analytics. In: Larusson, J., White, B. (eds) Learning Analytics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3305-7_4

Download citation

Publish with us

Policies and ethics