Skip to main content

2016 | OriginalPaper | Buchkapitel

A Random Forest-Based Self-training Algorithm for Study Status Prediction at the Program Level: minSemi-RF

verfasst von : Vo Thi Ngoc Chau, Nguyen Hua Phung

Erschienen in: Multi-disciplinary Trends in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Educational data mining aims to provide useful knowledge hidden in educational data for better educational decision making support. However, a large set of educational data is not always ready for a data mining task due to the peculiarities of the academic system as well as the data collection time. In our work, we focus on a study status prediction task at the program level where the data are collected and processed once a year in the time frame of the program of interest in an academic credit system. When there are little educational data labeled for the task, the effectiveness of the task might be affected and thus, the task should be considered in a semi-supervised learning process instead of a conventional supervised learning process to exploit a larger set of unlabeled data. In particular, we define a random forest-based self-training algorithm, named minSemi-RF, for the study status prediction task at the program level. The minSemi-RF algorithm is designed as a combination of Tri-training and Self-training styles in such a way that we turn a random forest-based self-training algorithm to be a parameter-free variant of the Tri-training algorithm. This algorithm produces a final classifier that can inherit the advantages of a random forest model. Based on the experimental results from the experiments conducted on the real data sets, our algorithm is proved to be effective and practical for early in-trouble student detection in an academic credit system as compared to some existing semi-supervised learning methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1988) Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1988)
3.
Zurück zum Zitat Bayer, J., Bydzovska, H., Geryk, J., Obsivac, T., Popelinsky, L.: Predicting drop-out from social behaviour of students. In: Proceedings of the 5th International Conference on Educational Data Mining, pp. 103–109 (2012) Bayer, J., Bydzovska, H., Geryk, J., Obsivac, T., Popelinsky, L.: Predicting drop-out from social behaviour of students. In: Proceedings of the 5th International Conference on Educational Data Mining, pp. 103–109 (2012)
5.
Zurück zum Zitat Dejaeger, K., Goethals, F., Giangreco, A., Mola, L., Baesens, B.: Gaining insight into student satisfaction using comprehensible data mining techniques. Eur. J. Oper. Res. 218, 548–562 (2012)CrossRef Dejaeger, K., Goethals, F., Giangreco, A., Mola, L., Baesens, B.: Gaining insight into student satisfaction using comprehensible data mining techniques. Eur. J. Oper. Res. 218, 548–562 (2012)CrossRef
6.
Zurück zum Zitat Dong, A., Chung, F., Wang, S.: Semi-supervised classification method through oversampling and common hidden space. Inf. Sci. 349–350, 216–228 (2016)CrossRef Dong, A., Chung, F., Wang, S.: Semi-supervised classification method through oversampling and common hidden space. Inf. Sci. 349–350, 216–228 (2016)CrossRef
7.
Zurück zum Zitat Koprinska, I., Stretton, J., Yacef, K.: Predicting student performance from multiple data sources. Artif. Intell. Educ. 9112, 678–681 (2015)CrossRef Koprinska, I., Stretton, J., Yacef, K.: Predicting student performance from multiple data sources. Artif. Intell. Educ. 9112, 678–681 (2015)CrossRef
8.
Zurück zum Zitat Kostopoulos, G., Kotsiantis, S., Pintelas, P.: Estimating student dropout in distance higher education using semi-supervised techniques. In: Proceedings of the 19th Panhellenic Conference on Informatics, pp. 38–43 (2015) Kostopoulos, G., Kotsiantis, S., Pintelas, P.: Estimating student dropout in distance higher education using semi-supervised techniques. In: Proceedings of the 19th Panhellenic Conference on Informatics, pp. 38–43 (2015)
9.
Zurück zum Zitat Kravvaris, D., Kermanidis, K.L., Thanou, E.: Success is hidden in the students’ data. Artif. Intell. Appl. Innovations 382, 401–410 (2012)CrossRef Kravvaris, D., Kermanidis, K.L., Thanou, E.: Success is hidden in the students’ data. Artif. Intell. Appl. Innovations 382, 401–410 (2012)CrossRef
10.
Zurück zum Zitat Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern. Part-A: Syst. Hum. 37(6), 1088–1098 (2007)CrossRef Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern. Part-A: Syst. Hum. 37(6), 1088–1098 (2007)CrossRef
11.
Zurück zum Zitat Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 38, 315–330 (2013)CrossRef Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 38, 315–330 (2013)CrossRef
12.
Zurück zum Zitat Peña-Ayala, A.: Educational data mining: a survey and a data mining-based analysis of recent works. Expert Syst. Appl. 41, 1432–1462 (2014)CrossRef Peña-Ayala, A.: Educational data mining: a survey and a data mining-based analysis of recent works. Expert Syst. Appl. 41, 1432–1462 (2014)CrossRef
13.
Zurück zum Zitat Romero, C., Espejo, P.G., Zafra, A., Romero, J.R., Ventura, S.: Web usage mining for predicting final marks of students that use Moodle courses. Comput. Appl. Eng. Educ. 21, 135–146 (2013)CrossRef Romero, C., Espejo, P.G., Zafra, A., Romero, J.R., Ventura, S.: Web usage mining for predicting final marks of students that use Moodle courses. Comput. Appl. Eng. Educ. 21, 135–146 (2013)CrossRef
14.
Zurück zum Zitat Saarela, M., Karkkainen, T.: Analysing student performance using sparse data of core bachelor courses. J. Educ. Data Min. 7(1), 3–32 (2015) Saarela, M., Karkkainen, T.: Analysing student performance using sparse data of core bachelor courses. J. Educ. Data Min. 7(1), 3–32 (2015)
15.
16.
Zurück zum Zitat Taruna, S., Pandey, M.: An empirical analysis of classification techniques for predicting academic performance. In: Proceedings of the IEEE International Advance Computing Conference, pp. 523–528 (2014) Taruna, S., Pandey, M.: An empirical analysis of classification techniques for predicting academic performance. In: Proceedings of the IEEE International Advance Computing Conference, pp. 523–528 (2014)
17.
Zurück zum Zitat Triguero, I., Garíca, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)CrossRef Triguero, I., Garíca, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)CrossRef
18.
Zurück zum Zitat Triguero, I., Garíca, S., Herrera, F.: SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans. Cybern. 45(4), 622–634 (2015)CrossRef Triguero, I., Garíca, S., Herrera, F.: SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans. Cybern. 45(4), 622–634 (2015)CrossRef
20.
Zurück zum Zitat Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995) Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
21.
Zurück zum Zitat Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17, 1529–1541 (2005)CrossRef Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17, 1529–1541 (2005)CrossRef
Metadaten
Titel
A Random Forest-Based Self-training Algorithm for Study Status Prediction at the Program Level: minSemi-RF
verfasst von
Vo Thi Ngoc Chau
Nguyen Hua Phung
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49397-8_19