Skip to main content
Top

13-02-2025 | Regular Paper

Predicting learning performance using NLP: an exploratory study using two semantic textual similarity methods

Authors: C. Papadimas, V. Ragazou, I. Karasavvidis, V. Kollias

Published in: Knowledge and Information Systems

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most learning analytics (LA) systems provide generic feedback, because they primarily draw on performance data based on quiz scores. This study explored the potential of student-generated summaries as an alternative method for predicting learning performance. Two hundred and fifty-four undergraduates first watched a series of six short video lectures and then wrote a short summary for each one. Based on their median performance quiz scores, the participants were divided into two performance groups. Sparse and dense text vectorization methods were used to represent the video lectures and student summaries. Three semantic textual similarity features were computed using cosine similarity and were used as input into seven common machine learning algorithms. The results indicated that the sparse similarity features outperformed dense ones in classifying performance. Also, the best classification accuracy was achieved using the K-Nearest Neighbors and Random Forrest algorithms. Overall, the findings suggest that semantic similarity measures can be used as additional proxy measures of learning, thereby enabling the real-time monitoring and evaluation of student understanding in LA contexts.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agarwal SR, Agrawal SB, Latif AM (2015) Sentence formation in NLP engine on the basis of indian sign language using hand gestures. Int J Comput Appl 116(17) Agarwal SR, Agrawal SB, Latif AM (2015) Sentence formation in NLP engine on the basis of indian sign language using hand gestures. Int J Comput Appl 116(17)
2.
5.
go back to reference Almatrafi O, Johri A (2018) Systematic review of discussion forums in massive open online courses (MOOCs). IEEE Trans Learn Technol 12(3):413–428CrossRefMATH Almatrafi O, Johri A (2018) Systematic review of discussion forums in massive open online courses (MOOCs). IEEE Trans Learn Technol 12(3):413–428CrossRefMATH
6.
go back to reference Arnold KE, Pistilli MD (2012) Course signals at Purdue: using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, ACM, New York, NY, pp. 267–270. Arnold KE, Pistilli MD (2012) Course signals at Purdue: using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, ACM, New York, NY, pp. 267–270.
7.
go back to reference André M, Mello RF, Nascimento A, Lins RD, Gašević D (2021) Toward automatic classification of online discussion messages for social presence. IEEE Trans Learn Technol 14(6):802–816CrossRef André M, Mello RF, Nascimento A, Lins RD, Gašević D (2021) Toward automatic classification of online discussion messages for social presence. IEEE Trans Learn Technol 14(6):802–816CrossRef
8.
go back to reference Asif R, Merceron A, Ali SA, Haider NG (2017) Analyzing undergraduate students’ performance using educational data mining. Comput & Educ 113:177–194CrossRefMATH Asif R, Merceron A, Ali SA, Haider NG (2017) Analyzing undergraduate students’ performance using educational data mining. Comput & Educ 113:177–194CrossRefMATH
17.
go back to reference Carmon CM, Hu X, Graesser AC (2023) Assessment in Conversational Intelligent Tutoring Systems: Are Contextual Embeddings Really Better? In: Wang N, Rebolledo-Mendez G, Dimitrova V, Matsuda N, Santos O (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_19CrossRef Carmon CM, Hu X, Graesser AC (2023) Assessment in Conversational Intelligent Tutoring Systems: Are Contextual Embeddings Really Better? In: Wang N, Rebolledo-Mendez G, Dimitrova V, Matsuda N, Santos O (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham. https://​doi.​org/​10.​1007/​978-3-031-36336-8_​19CrossRef
26.
go back to reference Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:​1705.​02364.
28.
go back to reference Crossley S, Paquette L, Dascalu M, McNamara DS, Baker RS (2016) Combining click-stream data with NLP tools to better understand MOOC completion. In: Proceedings of the sixth international conference on learning analytics & knowledge, pp 6–14. https://doi.org/10.1145/2883851.2883931 Crossley S, Paquette L, Dascalu M, McNamara DS, Baker RS (2016) Combining click-stream data with NLP tools to better understand MOOC completion. In: Proceedings of the sixth international conference on learning analytics & knowledge, pp 6–14. https://​doi.​org/​10.​1145/​2883851.​2883931
32.
go back to reference Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805.
34.
go back to reference Elouazizi N (2014) Point-of-view mining and cognitive presence in MOOCs: a (computational) linguistics perspective. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 32–37 Elouazizi N (2014) Point-of-view mining and cognitive presence in MOOCs: a (computational) linguistics perspective. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 32–37
37.
go back to reference Ferguson R, Clow D (2017) Where is the evidence? A call to action for learning analytics. In Proceedings of the seventh international learning analytics & knowledge conference (pp. 56–65). Ferguson R, Clow D (2017) Where is the evidence? A call to action for learning analytics. In Proceedings of the seventh international learning analytics & knowledge conference (pp. 56–65).
38.
go back to reference Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181MathSciNetMATH Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181MathSciNetMATH
40.
go back to reference Fiorella L, Stull AT, Kuhlmann S, Mayer RE (2020) Fostering generative learning from video lessons: benefits of instructor-generated drawings and learner-generated explanations. J Educ Psychol 112(5):895CrossRef Fiorella L, Stull AT, Kuhlmann S, Mayer RE (2020) Fostering generative learning from video lessons: benefits of instructor-generated drawings and learner-generated explanations. J Educ Psychol 112(5):895CrossRef
42.
go back to reference Gaddipati SK, Nair D, Plöger PG (2020) Comparative evaluation of pretrained transfer learning models on automatic short answer grading. arXiv preprint arXiv:2009.01303. Gaddipati SK, Nair D, Plöger PG (2020) Comparative evaluation of pretrained transfer learning models on automatic short answer grading. arXiv preprint arXiv:​2009.​01303.
45.
go back to reference Gomaa WH, Fahmy AA (2020) Ans2vec: A Scoring System for Short Answers. In: Hassanien A, Azar A, Gaber T, Bhatnagar R, Tolba M (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). AMLTA 2019. Advances in Intelligent Systems and Computing, vol 921. Springer, Cham. https://doi.org/10.1007/978-3-030-14118-9_59CrossRefMATH Gomaa WH, Fahmy AA (2020) Ans2vec: A Scoring System for Short Answers. In: Hassanien A, Azar A, Gaber T, Bhatnagar R, Tolba M (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). AMLTA 2019. Advances in Intelligent Systems and Computing, vol 921. Springer, Cham. https://​doi.​org/​10.​1007/​978-3-030-14118-9_​59CrossRefMATH
46.
go back to reference Graesser AC (2013) Prose comprehension beyond the word. Springer Science & Business Media, BerlinMATH Graesser AC (2013) Prose comprehension beyond the word. Springer Science & Business Media, BerlinMATH
49.
go back to reference Guillot R, Seanosky J, Guillot I, Boulanger D, Guillot C, Kumar V, Fraser SN(2018) Assessing learning analytics systems impact by summative measures. In 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT) (pp. 188–190). IEEE. Guillot R, Seanosky J, Guillot I, Boulanger D, Guillot C, Kumar V, Fraser SN(2018) Assessing learning analytics systems impact by summative measures. In 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT) (pp. 188–190). IEEE.
51.
go back to reference Hasnine MN, Akcapinar G, Flanagan B, Majumdar R, Mouri K, Ogata H (2018). Towards final scores prediction over clickstream using machine learning methods. In 26th International Conference on Computers in Education Workshop Proceedings (pp. 399–404). Asia-Pacific Society for Computers in Education (APSCE). Hasnine MN, Akcapinar G, Flanagan B, Majumdar R, Mouri K, Ogata H (2018). Towards final scores prediction over clickstream using machine learning methods. In 26th International Conference on Computers in Education Workshop Proceedings (pp. 399–404). Asia-Pacific Society for Computers in Education (APSCE).
52.
go back to reference Hassan S, Fahmy AA, El-Ramly M (2018) Automatic short answer scoring based on paragraph embeddings. Int J Adv Comput Sci Appl 9(10):397–402MATH Hassan S, Fahmy AA, El-Ramly M (2018) Automatic short answer scoring based on paragraph embeddings. Int J Adv Comput Sci Appl 9(10):397–402MATH
54.
go back to reference Hayati H, Chanaa A, Idrissi MK, Bennani S (2019) Doc2Vec & naïve bayes: learners' cognitive presence assessment through asynchronous online discussion TQ transcripts. Int J Emerg Technol Learn 14(8) Hayati H, Chanaa A, Idrissi MK, Bennani S (2019) Doc2Vec & naïve bayes: learners' cognitive presence assessment through asynchronous online discussion TQ transcripts. Int J Emerg Technol Learn 14(8)
57.
go back to reference Honnibal M, Montani I (2017) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. 7(1):411–420 (To appear) Honnibal M, Montani I (2017) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. 7(1):411–420 (To appear)
59.
go back to reference Jiang T, Jiao J, Huang S, Zhang Z, Wang D, Zhuang F, Wei F, Huang H, Deng D, Zhang Q (2022) PromptBERT: Improving BERT sentence embeddings with prompts Jiang T, Jiao J, Huang S, Zhang Z, Wang D, Zhuang F, Wei F, Huang H, Deng D, Zhang Q (2022) PromptBERT: Improving BERT sentence embeddings with prompts
60.
go back to reference Jivet I, Scheffel M, Drachsler H, Specht M (2017) Awareness is not enough: Pitfalls of learning analytics dashboards in the educational practice. In Data Driven Approaches in Digital Education: 12th European Conference on Technology Enhanced Learning, EC-TEL 2017, Tallinn, Estonia, September 12–15, 2017, Proceedings 12 (pp. 82–96). Springer International Publishing. Jivet I, Scheffel M, Drachsler H, Specht M (2017) Awareness is not enough: Pitfalls of learning analytics dashboards in the educational practice. In Data Driven Approaches in Digital Education: 12th European Conference on Technology Enhanced Learning, EC-TEL 2017, Tallinn, Estonia, September 12–15, 2017, Proceedings 12 (pp. 82–96). Springer International Publishing.
61.
go back to reference Jurafsky D, Martin JH (2023) Speech and language processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd ed.). Jurafsky D, Martin JH (2023) Speech and language processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd ed.).
63.
go back to reference Kim N, Patel R, Poliak A, Wang A, Xia P, McCoy RT, Tenney I, Ross A, Linzen T, Van Durme B, Bowman SR, Pavlick E (2019) Probing what different NLP tasks teach machines about function word comprehension Kim N, Patel R, Poliak A, Wang A, Xia P, McCoy RT, Tenney I, Ross A, Linzen T, Van Durme B, Bowman SR, Pavlick E (2019) Probing what different NLP tasks teach machines about function word comprehension
71.
go back to reference Long P, Siemens G (2011) Penetrating the Fog: Analytics in learning and education. EDUCAUSE Rev 22:31–40MATH Long P, Siemens G (2011) Penetrating the Fog: Analytics in learning and education. EDUCAUSE Rev 22:31–40MATH
72.
go back to reference Mangaroska K, Giannakos M (2019) Learning analytics for learning design: a systematic literature review of analytics-driven design to enhance learning. IEEE Trans Learn Technol 12(4):516–534CrossRefMATH Mangaroska K, Giannakos M (2019) Learning analytics for learning design: a systematic literature review of analytics-driven design to enhance learning. IEEE Trans Learn Technol 12(4):516–534CrossRefMATH
73.
go back to reference Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University PressCrossRefMATH Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University PressCrossRefMATH
74.
go back to reference Manovich (2013) Software Takes Command (A & C Black, Ed.) Manovich (2013) Software Takes Command (A & C Black, Ed.)
76.
go back to reference Mayer RE (2003) The promise of multimedia learning: using the same instructional design methods across different media. Learn Instr 13(2):125–139CrossRefMATH Mayer RE (2003) The promise of multimedia learning: using the same instructional design methods across different media. Learn Instr 13(2):125–139CrossRefMATH
77.
79.
go back to reference Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. Aaai, WashingtonMATH Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. Aaai, WashingtonMATH
80.
go back to reference Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781.
81.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
82.
go back to reference Mohler M, Mihalcea R (2009). Text-to-text semantic similarity for automatic short answer grading. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009) (pp. 567–575). Mohler M, Mihalcea R (2009). Text-to-text semantic similarity for automatic short answer grading. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009) (pp. 567–575).
88.
go back to reference Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
89.
go back to reference Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep Contextualized Word Representations. ArXiv, abs/1802.05365. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep Contextualized Word Representations. ArXiv, abs/1802.05365.
92.
go back to reference Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. Accessed: 2024–10–28. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. Accessed: 2024–10–28.
93.
go back to reference Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67MathSciNet Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67MathSciNet
96.
97.
go back to reference Robinson C, Yeomans M, Reich J, Hulleman C, Gehlbach H (2016) Forecasting student achievement in MOOCs with natural language processing. In: Proceedings of the sixth international conference on learning analytics & knowledge - LAK ’16, 383–387. https://doi.org/10.1145/2883851.2883932 Robinson C, Yeomans M, Reich J, Hulleman C, Gehlbach H (2016) Forecasting student achievement in MOOCs with natural language processing. In: Proceedings of the sixth international conference on learning analytics & knowledge - LAK ’16, 383–387. https://​doi.​org/​10.​1145/​2883851.​2883932
99.
go back to reference Schulte D, Hamborg F, Akbik A (2024) Less is more: parameter-efficient selection of intermediate tasks for transfer learning. arXiv preprint arXiv:2410.15148 Schulte D, Hamborg F, Akbik A (2024) Less is more: parameter-efficient selection of intermediate tasks for transfer learning. arXiv preprint arXiv:​2410.​15148
101.
go back to reference Silvola A, Näykki P, Kaveri A, Muukkonen H (2021) Expectations for supporting student engagement with learning analytics: An academic path perspective. Comput Educ 168:104192CrossRef Silvola A, Näykki P, Kaveri A, Muukkonen H (2021) Expectations for supporting student engagement with learning analytics: An academic path perspective. Comput Educ 168:104192CrossRef
102.
go back to reference Sultan MA, Salazar C, Sumner T (2016) Fast and easy short answer grading with high accuracy. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1070–1075). Sultan MA, Salazar C, Sumner T (2016) Fast and easy short answer grading with high accuracy. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1070–1075).
104.
go back to reference Torfi A, Shirvani RA, Keneshloo Y, Tavaf N, Fox EA (2020) Natural language processing advancements by deep learning: A survey. arXiv preprint arXiv:2003.01200. Torfi A, Shirvani RA, Keneshloo Y, Tavaf N, Fox EA (2020) Natural language processing advancements by deep learning: A survey. arXiv preprint arXiv:​2003.​01200.
106.
go back to reference van Gog T (2014) The signaling (or cueing) principle in multimedia learning. In: Mayer RE (ed) The Cambridge handbook of multimedia learning, 2nd edn. Cambridge University Press, New York, pp 263–278MATH van Gog T (2014) The signaling (or cueing) principle in multimedia learning. In: Mayer RE (ed) The Cambridge handbook of multimedia learning, 2nd edn. Cambridge University Press, New York, pp 263–278MATH
107.
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30. 31st Conference on Neural Information Processing Systems. CA, USA: Long Beach. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30. 31st Conference on Neural Information Processing Systems. CA, USA: Long Beach.
110.
go back to reference Weitekamp D, Harpstead E Koedinger KR (2020) An interaction design for machine teaching to develop AI tutors. In Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1–11). Weitekamp D, Harpstead E Koedinger KR (2020) An interaction design for machine teaching to develop AI tutors. In Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1–11).
111.
go back to reference Wise AF, Shaffer DW (2015) Why theory matters more than ever in the age of big data. J Learn Anal 2(2):5–13CrossRefMATH Wise AF, Shaffer DW (2015) Why theory matters more than ever in the age of big data. J Learn Anal 2(2):5–13CrossRefMATH
114.
go back to reference Yang J, Han SC, Poon J (2022) A survey on extraction of causal relations from natural language text. Knowl Infor Syst 64(5):1161–1186CrossRefMATH Yang J, Han SC, Poon J (2022) A survey on extraction of causal relations from natural language text. Knowl Infor Syst 64(5):1161–1186CrossRefMATH
115.
go back to reference Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316CrossRefMATH Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316CrossRefMATH
Metadata
Title
Predicting learning performance using NLP: an exploratory study using two semantic textual similarity methods
Authors
C. Papadimas
V. Ragazou
I. Karasavvidis
V. Kollias
Publication date
13-02-2025
Publisher
Springer London
Published in
Knowledge and Information Systems
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-024-02293-2

Premium Partner