Combining click-stream data with NLP tools to better understand MOOC completion

Authors:
Scott Crossley

Georgia State University, Atlanta, GA

Georgia State University, Atlanta, GA
View Profile

,
Luc Paquette

University of Illinois at Urbana-Champaign, Champaign, IL

University of Illinois at Urbana-Champaign, Champaign, IL
View Profile

,
Mihai Dascalu

University Politehnica of Bucharest, Bucharest, Romania

University Politehnica of Bucharest, Bucharest, Romania
View Profile

,
Danielle S. McNamara

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ
View Profile

,
Ryan S. Baker

Teachers College, Columbia University, New York, NY

Teachers College, Columbia University, New York, NY
View Profile

LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & KnowledgeApril 2016Pages 6–14https://doi.org/10.1145/2883851.2883931

Published:25 April 2016Publication History

LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge

Pages 6–14

ABSTRACT

Completion rates for massive open online classes (MOOCs) are notoriously low. Identifying student patterns related to course completion may help to develop interventions that can improve retention and learning outcomes in MOOCs. Previous research predicting MOOC completion has focused on click-stream data, student demographics, and natural language processing (NLP) analyses. However, most of these analyses have not taken full advantage of the multiple types of data available. This study combines click-stream data and NLP approaches to examine if students' on-line activity and the language they produce in the online discussion forum is predictive of successful class completion. We study this analysis in the context of a subsample of 320 students who completed at least one graded assignment and produced at least 50 words in discussion forums, in a MOOC on educational data mining. The findings indicate that a mix of click-stream data and NLP indices can predict with substantial accuracy (78%) whether students complete the MOOC. This predictive power suggests that student interaction data and language data within a MOOC can help us both to understand student retention in MOOCs and to develop automated signals of student success.

References

Balakrishnan, G., & Coetzee, D. 2013. Predicting Student Retention in Massive Open Online Courses Using Hidden Markov Models. Electrical Engineering and Computer Sciences University of California at Berkeley.Google Scholar
Boyer, S., & Veeramachaneni, K. 2015. Transfer Learning for Predictive Models in Massive Open Online Courses. In the Proceedings of the International Conference on Artificial Intelligence in Education.Google Scholar
Bradley, M. M., and Lang, P. J. 1999. Affective norms for English words (ANEW): Stimuli, instruction manual and affective ratings. Technical report. The Center for Research in Psychophysiology, University of Florida.Google Scholar
Cambria, E. and Hussain, A. 2015. Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. Cham, Switzerland: Springer. Google ScholarDigital Library
Cambria, E., Havasi, C., & Hussain, A. 2012. SenticNet 2: A semantic and affective resource for opinion mining and sentiment analysis. In G. M. Youngblood & P. M. Mcarthy (Eds.), Proceedings of the 25^th Florida artificial intelligence research society conference (pp. 202--207).Google Scholar
Cambria, E., Speer, R., Havasi, C., & Hussain, A. 2010. SenticNet: A publicly available semantic resource for opinion mining. In C. Havasi, D. Lenat, & B. Van Durme (Eds.), Commonsense Knowledge: Papers from the AAAI Fall Symposium (pp. 14--18).Google Scholar
Chaturvedi, S., Goldwasser, D., & Daume, H. 2014. Predicting instructor's intervention in MOOC forums. Proceedings of the 52^nd Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
Crossley, S. A. 2013. Advancing research in second language writing through computational tools and machine learning techniques. Language Teaching, 46 (2), 256--271.Google ScholarCross Ref
Crossley, S. A., McNamara, D. S., Baker, R., Wang, Y., Paquette, L., Barnes, T., & Bergner, Y. 2015. Language to completion: Success in an educational data mining massive open online class. In Santos, O. C., Boticario, J. G., Romero, C., Pechenizkiy, M., Merceron, A., Mitros, P., Luna, J. M., Mihaescu, C., Moreno, P., Hershkovitz, A., Ventura, S., & Desmarais, M. (eds.) Proceedings of the 8^th International Conference on Educational Data Mining. (pp. 388--392).Google Scholar
Crossley, S. A., Kyle, K., & McNamara, D. S. in press. The Tool for the Automatic Analysis of Text Cohesion (TAACO): Automatic Assessment of Local, Global, and Text Cohesion. Behavior Research Methods.Google Scholar
Dascalu, M., 2014. Analyzing discourse and text complexity for learning and collaborating, Studies in Computational Intelligence. Springer, Switzerland. Google ScholarDigital Library
Dascalu, M., Dessus, P., Bianco, M., Trausan-Matu, S., & Nardy, A., 2014. Mining texts, learners productions and strategies with ReaderBench. In Educational Data Mining: Applications and Trends, A. Peña-Ayala Ed. Springer, Switzerland, 335--377.Google Scholar
Dascalu, M., Stavarache, L. L., Trausan-Matu, S., Dessus, P., & Bianco, M., 2014. Reflecting Comprehension through French Textual Complexity Factors. In 26th Int. Conf. on Tools with Artificial Intelligence (ICTAI 2014) IEEE, Limassol, Cyprus, 615--619. Google ScholarDigital Library
Dascalu, M., Trausan-Matu, S., Dessus, P., & Mcnamara, D. S., 2015. Discourse cohesion: A signature of collaboration. In 5th Int. Learning Analytics & Knowledge Conf. (LAK'15) ACM, Poughkeepsie, NY, 350--354. Google ScholarDigital Library
Dascalu, M., Trausan-Matu, S., Mcnamara, D. S., & Dessus, P., in press. ReaderBench -- Automated Evaluation of Collaboration based on Cohesion and Dialogism. International Journal of Computer-Supported Collaborative Learning.Google Scholar
DeBoer, J., Ho, A. D., Stump, G. S., & Breslow, L. 2014. Changing "Course": Reconceptualizing Educational Variables for Massive Open Online Courses. Educational Researcher, March, 74--84.Google Scholar
Elouazizi, N. 2014. Point of view mining and cognitive presence in MOOCs: A (computational) linguistic perspective. Proceedings of the Empirical Methods in Natural Language Processing Workshop, 32--37.Google ScholarCross Ref
Halawa, S., Greene, D., & Mitchell, J. 2014. Dropout Prediction in MOOCs Using Learner Activity Features. Experiences and Best Practices in and Around MOOCs, 7.Google Scholar
He, J., Bailey, J., Rubinstein, B. I., Zhang, R. 2015. Identifying At-Risk Students in Massive Open Online Courses. In Twenty-Ninth AAAI Conference on Artificial Intelligence. Google ScholarDigital Library
Hu, M., & Liu, B. 2004. Mining and summarizing customer reviews. In W. Kim & R. Kohavi (Eds.), Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 168--177). Google ScholarDigital Library
Hutto, C. J., & Gilbert, E. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In E. Adar & P. Resnick (Eds.), Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (pp. 216--225).Google Scholar
Kizilcec, R. F., Piech, C., and Schneider, E. 2013. Deconstructing disengagement: analyzing learnersubpopulations in massive open online courses. In the Proceedings of the Third International Conference on Learning Analytics and Knowledge, 170--179. Google ScholarDigital Library
Kloft, M., Stiehler, F., Zheng, Z., & Pinkwart, N. 2014. Predicting MOOC Dropout over Weeks Using Machine Learning Methods. The 2014 Conference on Empirical Methods on Natural Language Processing.Google ScholarCross Ref
Koller, D., Ng, A., Do, C., and Chen, Z. 2013. Retention and Intention in Massive Open OnlineCourses. Educause.Google Scholar
Kyle, K., and Crossley, S. A. in press. Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application. TESOL Quarterly.Google Scholar
Lauria, E. J., Baron, J. D., Devireddy, M., Sundararaju, V., & Jayaprakash, S. M. 2012. Mining Academic Data to Improve College Student Retention: An Open Source Perspective. In Proceedings of the 2nd Conference on Learning Analytics and Knowledge, 139--142. Google ScholarDigital Library
Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., & Loumos, V. 2009. Dropout Prediction in e-Learning Courses Through the Combination of Machine Learning Techniques. Computers & Education, 53(3), 950--965. Google ScholarDigital Library
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55--60).Google ScholarCross Ref
McNamara, D. S., Crossley, S. A., & Roscoe, R. 2013. Natural Language Processingin an Intelligent Writing Strategy Tutoring System. Behavior Research Methods, 45 (2), 499--515.Google ScholarCross Ref
Mohammad, S. M., & Turney, P. D. 2010. Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (pp. 26--34). Association for Computational Linguistics. Google ScholarDigital Library
Mohammad, S. M., & Turney, P. D. 2013. Crowdsourcing a word--emotion association lexicon. Computational Intelligence, 29(3), 436--465.Google ScholarCross Ref
Moon, S., Potdar, S., & Martin, L. 2014. Identifying student leaders from MOOC discussion forums through language influence. Proceedings of the Empirical Methods in Natural Language Processing Workshop, 15--20.Google ScholarCross Ref
Pennebaker, J. W., Booth, R. J., and Francis, M. E. 2007. LIWC2007: Linguistic inquiry and word count. Austin, Texas.Google Scholar
Ramesh, A., Goldwasser, D., Huang, B., Daume, H., and Getoor, L. 2014. Understanding MOOC Discussion Forums using Seeded LDA. ACL Workshop on Innovative Use of NLP for Building Educational Applications, 22--27.Google Scholar
Saif, M., and Turney, P. 2013. Crowdsourcing a Word-Emotion Association Lexicon, Computational Intelligence, 29 (3), 436--465.Google ScholarCross Ref
Scherer, K. R. 2005. What are emotions? And how should they be measured? Social Science Information, 44 (4), 695--729.Google ScholarCross Ref
Seaton, D. T., Bergner, Y., Chuang, I., Mitros, P., & Pritchard, D. E. (2014). Who does what in a massive open online course? Communications of the ACM, 57(4), 58--65. Google ScholarDigital Library
Sharma, K., Jermann, P., & Dillenbourg, P. 2015. Identifying Styles and Paths Toward Success in MOOCs. In the Proceedings of the 8th International Conference on Educational Data Mining, 408--411.Google Scholar
Taylor, C., Veeramachaneni, K., & O'Reilly, U. M. 2014. Likely to Stop? Predicting Stopout in Massive Open Online Courses. arXiv preprint, arXiv:1408.3382.Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. 2011. Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267--307. Google ScholarDigital Library
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 173--180). Association for Computational Linguistics. Google ScholarDigital Library
Wen, M., Yang, D. and Rose, C. P. 2014. Sentiment Analysis in MOOC Discussion Forums: What does it tell us? In the Proceedings of the 7th International Conference on Educational Data Mining, 130--137.Google Scholar
Wen, M., Yang, D. and Rose, C. P. 2014. Linguistic Reflections of Student Engagement in Massive Open Online Courses. In the Proceedings of the International Conference on Weblogs and Social Media.Google Scholar
Wang, Y. 2014. MOOC Leaner Motivation and Learning Pattern Discovery. In the Proceedings of the 7th International Conference on Educational Data Mining, 452--454.Google Scholar
Wang, Y. E., Paquette, L., Baker, R. 2015. A Longitudinal Study on Learner Career Advancement in MOOCs. Journal of Learning Analytics, 1 (3), 203--206.Google ScholarCross Ref
Whitehill, J., Williams, J. J., Lopez, G., Coleman, C. A., & Reich, J. 2015. Beyond Predictions: First Steps Toward Automatic Intervention in MOOC Student Dtopout. Available at SSRN 2611750.Google Scholar

Index Terms

Combining click-stream data with NLP tools to better understand MOOC completion
1. Applied computing
  1. Arts and humanities
    1. Language translation
  2. Education
    1. Computer-assisted instruction
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Perception of MOOC Pedagogical Tools and Learners' Learning Styles in MOOC Blended Teaching: a Case Study
ICEBT '19: Proceedings of the 2019 3rd International Conference on E-Education, E-Business and E-Technology

Rapid development has been achieved since the emergence of MOOC in 2008, but there are still many defects in the popularization of MOOC. Developing blended teaching by utilizing is considered to be one of effective means to overcome these shortcomings. ...
Read More
Self-Regulation for High School Learners in a MOOC Computer Science Course
SIGCSE '20: Proceedings of the 51st ACM Technical Symposium on Computer Science Education

Courses designed for Massive Open Online Courseware (MOOC)platforms provide learners worldwide with extensive learning opportunities. Previous research has explored learner motivation in MOOC courses using self-regulated learning (SRL) theory. How-ever; ...
Read More
Chinese English Teachers' Perspectives on “Distributed Flip MOOC Blends”: From BMELTT to BMELTE

This article reports on a study involving experienced university lecturers from mainland China reflecting on how to blend FutureLearn MOOCs into their existing English Language Teaching (ELT) curricula while on an ‘upskilling' teacher education summer ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge
April 2016
567 pages
ISBN:9781450341905
DOI:10.1145/2883851
General Chairs:
Dragan Gašević
University of Edinburgh, United Kingdom
,
Grace Lynch
Society for Learning Analytics Research
,
Program Chairs:
Shane Dawson
University of South Australia
,
Hendrik Drachsler
University of the Netherlands
,
Carolyn Penstein Rosé
Carnegie Mellon University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 April 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MOOC
click-stream data
educational data mining
educational success
natural language processing
predictive analytics
sentiment analysis
Qualifiers
- research-article
Conference

Acceptance Rates
LAK '16 Paper Acceptance Rate36of116submissions,31%Overall Acceptance Rate236of782submissions,30%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 75
  Total Citations
  View Citations
- 2,945
  Total Downloads
- Downloads (Last 12 months)335
- Downloads (Last 6 weeks)54
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Combining click-stream data with NLP tools to better understand MOOC completion

LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Perception of MOOC Pedagogical Tools and Learners' Learning Styles in MOOC Blended Teaching: a Case Study

Self-Regulation for High School Learners in a MOOC Computer Science Course

Chinese English Teachers' Perspectives on “Distributed Flip MOOC Blends”: From BMELTT to BMELTE