Introduction
The prediction of academic performance is one of the first and most popular subjects in the fields of learning analytics and educational data mining (Chatti, Dyckhoff, Schroeder, & Thüs,
2012; Peña-Ayala,
2014; Romero, Olmo, & Ventura,
2013). Academic performance can be defined as the score obtained by students from an evaluation made at the end of a learning activity. This score could belong to a short-term learning activity (i.e. a learning object), a lesson, an academic term, or to a complete educational process, i.e., GPA (Peña-Ayala,
2014). Studies have indicated that the learning traces left by students in online learning settings (participation in discussions, spending time on the learning material etc.) are significant in predicting their academic performance.
Lopez, Luna, Romero, and Ventura (
2012) analysed the relationship between students’ academic performance and their interactions in the discussion forum in the Moodle learning management system. The researchers identified eight features of student participation in the discussion forum. By using these features, they attempted to predict student success as pass or fail. Furthermore, they compared the performance of various classification algorithms based on the classification accuracy. According to their findings, the Naive Bayes algorithm reached a classification accuracy of 89.4%, carried out via six features that was determined as a result of the feature selection process. The study findings also revealed that the participation of students in the discussion forum in the Moodle setting could be a reliable feature in the prediction of students’ academic performance.
In another study carried out in the Moodle setting by Romero, Ventura, Hervás, and Gonzales (
2008), the performance of a group of classification algorithms was compared in order to observe which one would predict the academic performance the most accurate. The researchers selected nine different features that had reflected students’ use of Moodle in addition to their participation data in the discussion forum. Students’ academic performance was coded as pass, fail, good, or excellent. Upon examination of the research findings, it was observed that the classification accuracy of algorithms varied in the range of 60 to 70%. The research results indicated that students’ low usage of online learning system could be one of the reasons behind the low accuracies.
Osmanbegović and Suljić (
2012) established a classification model in order to predict the success of students in university using data relating to their demographics and pre-university success. They used students’ final score in the course of “Management Informatics” as the feature for success. The researchers reported that the Naive Bayes algorithm correctly classified students with a rate of 76.65% using the given set of data. In another study that aimed to predict student performance using interaction data obtained from the learning management system, Macfadyen and Dawson (
2010) identified fifteen features that had indicated a statistically meaningful relationship with the students’ academic performance. They used the logistic regression method in order to predict the academic performance of students using these features. The researchers stated that the established classification model accurately classified unsuccessful students with a rate of 81%. The number of messages sent to the discussion forum, number of sent emails, and the number of completed assignments were identified as important features in predicting end of term grades. This study further indicated that pedagogically meaningful information could be produced from data on student interactions obtained from learning management systems.
The majority of studies predicting academic performance were carried out using data collected at the end of term. The results obtained from those studies were significant in terms of their determination of features which might lead to a failure; however, they did not provide excessive information to prevent failure. On the other hand, online learning settings constantly collect student interaction data. The prediction models that could be established with the analysis of this data in the earlier stages would be effective in the prevention of possible failures and help instructional designers design and develop pedagogical intervention systems (Costa, Fonseca, Santana, de Araújo, & Rego,
2017), which are also referred to as early warning systems. In the following section, early warning systems will be explained in detail from educational perspective.
Educational early warning systems
Early warning systems can be regarded as the next step in the prediction of academic performance. The aim here is the prediction of academic performance at an earlier stage using features that are derived from online learning settings. As such, teachers may be able to identify students under risk of academic failure and offer assistance in order to help these students make improvements (Campbell, DeBlois, & Oblinger,
2007). In other words, the performance exhibited by students in lessons could be predicted in advance and possible failures can be prevented by timely interventions (Johnson, Smith, Willis, Levine, & Haywood,
2006).
Although educational early warning systems have received more attention in recent years, the Signal Project implemented at Purdue University (Indiana, USA) has been cited as a key example of a successful early warning system in the literature (K. E. Arnold,
2010). Instructors are provided a Signal plug-in to be used in their online environments. This plug-in allocates a risk indicator for each student by analysing their interactions (e.g. whether students have read online course materials assigned by the professor, done practice assignments, attended tutorial sessions after class or participated in online class discussions etc.) through the aid of prediction algorithms. The risk indicator structured as a traffic light provides feedback to the student regarding their lesson performance in the form of red (indicating a high probability of failure), yellow (medium probability of failure), and green (high probability of success) lights. A notable improvement in student success was observed in lessons which utilized the Signal system in the pilot scheme that was implemented in the university over a 2-year period. In the first week of a lesson attended by 220 students, 45 students were placed in the red (high risk) and 32 in the yellow (medium risk) categories. In the progressive weeks, 55% of students in the red group moved up to the yellow group (C level) and 24.4% to the green group (A or B level), while 10.6% remained in the red group. It was observed that 31% of students starting in the yellow group remained in in the same group while 69% moved up to the green group.
Hu, Lo, and Shih (
2014) analysed students’ interaction data in an online lesson via data mining techniques in order to develop an early warning system that would predict the learning performance exhibited by students in a course. They set various classification models that would predict students’ pass/fail status in lessons using the data from weeks 4, 8, and 13. The researchers concluded that their early warning system successfully predicted students’ learning performances. The experimental data presented by the researchers indicated that the Classification and Regression Tree (CART) algorithm, supplemented by AdaBoost achieved over 90% accuracy in all datasets. On the other hand, the time dependent features derived from the learning management system had been stated as significant features in predicting student performance.
Costa et al. (
2017) benefited from machine learning techniques in order to reduce the high level of failure in the subject of Introductory Programming. The researchers investigated the effectiveness of four different prediction algorithms in the early identification of students with a high probability of failure. The researchers stated that the model established with the Support Vector Machine (SVM) algorithm was effective in the early prediction of unsuccessful students with a rate of 83%. At the same time, this study indicated that the pre-processing techniques were effective in increasing the performance of prediction algorithms.
Previous studies showed that early prediction of academic failure is possible, however, transferability and generalizability of prediction models into the different courses are still limited. One of the major reasons behind this is because of different instructional conditions used in the course also affects the performance of prediction models (Gašević, Dawson, Rogers, & Gasevic,
2016). Therefore, it is important to develop and validate prediction models for different courses. In this study, we aimed at developing an early warning system for a blended course and compared not only different algorithms but also different pre-processing techniques effects on algorithms’ prediction performances. Feature sets used in the study are also different from existing studies. The results obtained here will be helpful in designing early warning system for the online learning environments and exploring new features to be used in these systems.
The remainder of this paper is structured as follows: the second section provides information about the aims of the study and the research questions to be addressed. The third section explains the research methods, data pre-processing, data analysis techniques, and the online learning setting where the data is collected. The fourth section provides information on the findings. In the conclusion, the findings are discussed in the light of other research findings and suggestions are made for future research.
Discussion and conclusions
The current study aimed to develop a model that enables the prediction of students’ end-of-term academic performance earlier in the course using interaction data in an online learning setting. To that end, a two-stage analysis method was followed. In the first stage, the performance of the most widely cited classification algorithms in the literature were compared using the complete data set. At the same time, in the data pre-processing period, the impact of different techniques used for data transformation and feature selection known to have an effect on classification performance was tested. In the second stage, it was investigated whether the end-of-term performance of students could be predicted in earlier weeks using the selected algorithm, features, and data transformation techniques. At this stage, the performance of classification models formed with data obtained in weeks 3, 6, 9, 12, and 14 in predicting student academic performance was compared. Several performance metrics (e.g., Classification Accuracy, Sensitivity, Specificity, and F-Measure) were used in order to compare the performance of different classification models that were obtained. End-of-term scores in the Computer Hardware course were taken into consideration as an indicator of student academic performance, which was the target feature. Various features reflecting students' behaviour in the online learning setting were used to predict their end-of-term performance. The interaction of students in the learning setting was taken into consideration in the determination of these features. Academic performance was coded in the form of “Passed” or “Failed”.
The effect of data transformation on classification performance was tested in the analysis stage. All features were converted into a categorical form using the techniques of equal frequency and equal width. Results indicated that models formed with categorical data performed better than models formed with continuous data. On the other hand, when different data transformation methods are compared, it was observed that the equal width technique produced better outcomes in comparison to the equal frequency method. The conducted studies indicate that categorical data give rise to better outcomes in comparison to continuous data in classification analyses (Cristobal Romero, Espejo, et al.,
2013). At the same time, data converted into a categorical form may provide easier interpretation of obtained models by non-specialist individuals. Furthermore, it increases the reusability of the established models in different data sets, or, in other words, their generalizability.
Secondly, the impact of feature selection on classification performance was tested. For this purpose, the ten features which received the highest scores in accordance with the three different feature selection methods (Gain Ratio, Gini Index and SVM Weight) and the performance of the classification models using these features were compared with those that use all features (see
Appendix). The results indicated that the classification performances obtained with 10 features selected according to the Gini index method was higher than the classification performances obtained through the use of all features. The feature selection method carries importance as it would enable the formation of prediction models through the use of a lower number of features. The use of a lower number of features means models that can be interpreted more easily.
Another intended purpose of the feature selection method is to gain an understanding of which features have the greatest impact on the structure (i.e. academic performance) that needs to be predicted (Baker, R. S. J. d.,
2010). When the findings relating to important features were examined, it was observed that features such as regular logging of students onto the learning setting, the participation in the discussion forum, doing homework on a weekly basis, reading, and evaluating the content written by other students came to the fore. These findings are consistent with other studies. For example, in a study conducted in the Moodle setting, Lopez et al. (
2012) reached the conclusion that the participation of students in the discussion forum is a good predictor of their success in lessons. Macfadyen and Dawson (
2010) on the other hand established a regression model in order to predict the success of students in lessons using the interaction data in the Blackboard learning setting. It was found out that the most important feature in the model that would correctly predict a notable majority of students with the risk of failure was the number of posts sent to the discussion forum. The number of answers (response_create_count), which is an indicator of the participation of students in the discussion forum, along with the number of evaluations undertaken for the written questions (question_rating_count), became prominent as important features in the prediction of student performance.
Lavoué (
2011) stated that the tagging feature used in online learning settings is important in the learning of new concepts, the formation of relationships in between concepts, and in supporting the cognitive and social learning processes of students to help them see the reflections of their internal concepts. The tagging of the written content (tag_used_count) and the addition of new tags to the setting (tag_created_count) emerge as important features in the prediction of student performance in this study as well. The Computer Hardware course in which the study was carried out is identical in that it is dominated by concepts and students are expected to learn 150–200 concepts throughout the semester. When examined from this aspect, the impact of features relating to tagging on academic performance was found to be important in encouraging the use of tagging in concept-dominated subjects. On the other hand, the regular logging-on of students in the setting and the time spent on it emerged as significant features. Akçapınar, Hasnine, Majumdar, Flanagan, and Ogata (
2019), indicated the impact of logging onto learning settings regularly on academic success rather than carrying out many activities in one session.
In a comparison of the performances of classification algorithms developed using the complete data set, the highest level of classification accuracy was reached using the kNN and CN2 algorithms and under the condition in which data was converted into categorical form using the equal width method and important features selected according to the Gini index. Under these circumstances, the kNN and CN2 Rules algorithms classified 86% of students correctly while the other classification algorithms attained a classification accuracy of 76% and above. The classification model obtained with the kNN algorithm correctly predicted 24 of the 27 (89%) failed students at the end of term, and 41 out of 49 (84%) passed students. In contrast, it classified 8 students (16%) that were passed in reality as failed and classified 3 students (11%) that were failed in reality as passed.
In accordance with the findings obtained in the first stage, the classification models were formed with data that were obtained from weeks 3, 6, 9, 12, and 14. The results of the analysis indicated that the model established in the 3
rd week correctly classified 20 out of 27 (74%) failed students at the end of term but incorrectly classified 13 out of 49 (26%) students who passed the subject as failed. Error rates gradually declined in models that were formed from data obtained in the progressive weeks. This result was expected as the data regarding student interactions in the system increases as the weeks progress. Costa et al. (
2017) reported that the prediction performance of algorithms increased notably when students had completed 50% of the subjects.
One of the most important aims of educational data mining and learning analytics studies is to identify students with low performance and those experiencing problems and to provide the capability to change the behaviour underlying these conditions through timely interventions (Kimberly E. Arnold & Pistilli,
2012). Feedback carries notable importance in the change of behaviour and the provision of instantaneous and individualised feedback to students is made possible through the developments in educational technologies (Bienkowski, Feng, & Means,
2012; Tanes, Arnold, King, & Remnet,
2011). The Signal project implemented by Purdue University indicated that students perceive individualised feedback given in an understandable format as positive (Bienkowski et al.,
2012). The researchers used the information obtained through data mining analysis for the purpose of providing feedback to students in their study and showed that this was effective in increasing student success (K. E. Arnold,
2010). The prediction of unsuccessful students through data obtained in a short time frame such as the 3
rd week with a rate of 74% carries significance with respect to the prevention of possible failures. The mentioned data will gain teachers time to undertake pedagogical interventions (Costa et al.,
2017) and will help lower the failure rates when shared with the students within the context of an effective feedback system.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.