1 Introduction
2 Overview
2.1 Research Questions
-
What are the measurable aspects of the prediction of student academic achievement in higher education?
-
Which are the best DM methods to predict students’ academic achievement in higher education?
2.2 Research Methodology
2.2.1 Search Strategy
2.2.2 Inclusion Criteria
-
Studies that have been conducted between 2007 and 2018.
-
Studies that reported the used data mining method for performing the prediction.
-
Studies that reported the features used for performing the prediction.
2.2.3 Exclusion Criteria
-
Studies that focused on unsupervised analysis.
-
Studies that did not include analysis or prediction of academic success.
-
Studies which are not performed on a higher education level, e.g., elementary or secondary school.
3 Overview of Features, Algorithms and Software Used for Prediction in EDM
3.1 Overview of Features Used in EDM
-
Demographical features, which as the title suggests, include gender, age, marital status, background, income, occupation, mobility (transportation), disability, parents’ education level, etc.
-
Pre-enrollment features, which are related to students’ achievements before their enrollment such as their GPA from previous studies, previous major (field of study), previous institute, language proficiency, grades earned in prerequisite courses, as well as pre-enrollment exams, e.g., Scholastic Aptitude Test (SAT) and Graduate Record Examinations (GRE), etc.
-
Post-enrollment features, which are related to students’ attitude after enrollment and during the course such as attendance, assignments, scores earned in quizzes and final exams, lab work, writing notes in class, boredom level during lectures, the number of credit hours per semester, etc.
3.2 Overview of Data Mining Methods Used in EDM
DM method | Definition | Pros. | Cons. |
---|---|---|---|
Decision trees | A classification method in which each internal node serves as a “test” on a feature, each branch serve as the outcome of the test, and each leaf node correspond as a class label (decision taken after computing all features) | Easy to understand Can handle missing values | Could suffer from overfitting Less accuracy with continuous variables |
Support vector machine (SVM) | A supervised learning model with associated learning algorithms which analyze data being used for classification and regression analysis by applying the statistics of support vectors to categorize unlabeled data | High accuracy Can handle different data types Effective in high dimensional space | Black box Training processes can take time High algorithmic complexity |
Naïve Bayes | A classification method that assumes that the prognostic features are conditionally independent and that there are no hidden features that could affect the process of prediction | Simple to use Can deal with missing and noisy data | Black box Assumes that all features are independent and equally important |
Neural Networks | A method that learns to perform tasks by considering examples, generally without being programmed with any task-specific rules | High accuracy Can handle missing and noisy data | Black box Difficult in dealing with big data High complexity |
Logistic regression | A statistical model that is often taken to apply to a binary dependent variable. More formally, a logistic model is one where the log-odds of the probability of an event is a linear combination of independent or predictor variables | Easy to understand Provides probability outcome | Does not handle missing values well Could suffer from over-fitting |
K-nearest neighbour | A classification and regression method that stores available cases and classifies new cases based on a similarity measure. A case is classified by a majority vote of its neighbours, with the case being assigned to the class most common amongst its K nearest neighbours measured by a distance function | Nonparametric Easy to understand the output Robustness to noisy training data | Black box Difficult in handling mixed data type Assumes that all features are equally important Sensitive to outliers |
Rule induction | An area of machine learning where formal rules (If–then) are extracted based on statistical significance from a set of data observations | Low computational-space cost Can make effective use of statistical measures to combat noise | Slow training time Has trouble recognizing exceptions or small, low-frequency sections of the space |
3.3 Overview of Data Mining Software Used in EDM
DM tool and source | Programing language | Pros. | Cons. |
---|---|---|---|
RapidMiner1 (commercial) | Java | Supports command line and Graphical User Interface (GUI) Displays visualizations Performs cross-validation at multiple levels Provides wide range of metrics for model assessments | Limited functionality for engineering new features out of existing features |
SPSS2 (Commercial) | Java | Supports command line and GUI Easily visualize the process Functionality for creating new features out of existing ones | Minimum support for modelling Less flexible than other tools Difficult to customize Slow in handling large data sets |
WEKA3 Open source | Java | Supports command line and GUI Displays visualizations | Does not support the creation of new features |
KNIME4 (Open source) | Java | Supports GUI Displays visualizations Has the ability to integrate data from multiple sources Has extensions allowing interface with R, Python, Java, and SQL | Does not support interactive execution Not all nodes can be streamed |
Orange5 (Open source) | Python Cython C ++ C | Supports command line and GUI Customizable visualizations Easy to understand interface | Limited in the scale of data that it can work with, comparable to Excel Less suitable for big projects |
Spark MLLib6 (Open source) | Scale SQL Java R Python | Displays visualizations Can connect with several programming languages through API | Purely programmatic tool (less usability for non-programmers) |
KEEL7 (Open source) | Java | Supports command line and GUI Displays visualizations Supports discretization algorithms Feature selection support with a broad range of algorithms Extensive support for missing data | Limited functionality for engineering new features out of existing ones Limited support for clustering and factor analysis Limited support for association rule mining |
4 Comprehensive Review of Academic Achievement Prediction Literature
4.1 Features Used in Predicting Students’ Academic Achievement
4.1.1 Demographics
4.1.1.1 Gender
4.1.1.2 Age
4.1.1.3 Marital Status
4.1.1.4 Other Demographic Features
4.1.2 Pre-enrollment Features
4.1.2.1 GPA
4.1.2.2 Academic Language Skills
4.1.3 Post-enrollment Features
4.1.3.1 Grades
4.1.3.2 Results in Previous Semester
4.1.3.3 Attendance
4.1.3.4 Other Post-enrollment Features
4.2 Mostly Used Data Mining Methods in Predicting Students’ Achievement
4.3 Mostly Used Data Mining Tools in Predicting Students’ Achievement
4.4 Per-Task Analysis
4.4.1 Prediction of Students’ Academic Performance or GPA at a Degree Level
-
Sembiring et al. (2011) sampled 300 students to predict the final grade of students from the faculty of computer systems and software engineering. They used innovative features that were not visible in the rest of the studies. The significance of each feature was tested using multi-variant analysis methods. They found that family support had the most impact (52.6% contribution) on the prediction, followed by engaging time, then study behaviour, and finally study interest. On the other hand, students’ personal beliefs did not have any impact.
-
Kabakchieva (2013) used a dataset of 10,330 students to predict their performance using 5 classes (Bad, average, good, very good and excellent). They found out that the classifiers perform differently for the five classes. Another finding is that the post-enrollment features related to students’ university admission score, and numbers of failures at the first year exams are among the most influencing features in the classification.
-
Abu Saa (2016) collected data from 270 students using a survey distributed in daily classes and online with the aim of predicting students’ performance in an IT Department. They found that the students’ performance is not totally dependent on post-enrolment features, such as their academic efforts, but that on the contrary, there are many other features that have equal to more significant influences as well. This includes demographical features, such as gender, and mother occupation, as well as pre-enrolment features, such as high school grade, and University fees discount.
-
Asif et al. (2017) predicted students’ performance using a sample of 210 undergraduate students. The features they used to perform the prediction are marks only. The results of their study showed that it is possible to predict the graduation performance in a four-year university program using only pre-university marks and marks of first and second-year courses with a reasonable accuracy.
Author (s) | Prediction | Predictor features | DM method (s) | DM tool | Result |
---|---|---|---|---|---|
Nghe et al. (2007) | Predict students’ GPA at the end of the first year of their Master program using three models at the Asian Institute of Technology (AIT), Thailand | Demographics: gender, marital status, income, and age Pre-enrolment features: academic institute, previous GPA, English proficiency, and TOEFL score | Decision-trees (J48) Bayesian-tree | Weka | J48 produced better accuracy (91.98%) for 2 classes (pass/fail), (67.74%) for 3 classes (Fail/Good/Very Good) and (63.25%) for 4 classes (Fail/Fair/Good/Very Good) |
Nghe et al. (2007) | Predict students’ GPA at the end of the third year using three models in Can Tho University (CTU), Vietnam | Demographics: gender, age, family, job, and religion Pre-enrolment features: English skill, entry marks range, the field of study, and faculty Post-enrollment features: second-year GPA | Decision-trees (J48) Bayesian-tree | Weka | J48 produced better accuracy (92.86%) for 2 classes (pass/fail), (84.18%) for 3 classes (Fail/Good/Very Good) and (66.69%) for 4 classes (Fail/Fair/Good/Very Good) |
Yadav et al. (2011) | Predict computer master students’ performance at VBS Purvanchal University in Jaunpur, India | Post-enrollment features: attendance, test grade, seminar grade, assignment grade, and lab work | Decision trees (ID3, CART, and C4.5) | Weka | CART produced the best accuracy (56.25%) followed by ID3 (52.08%) then C4.5 (45.83%) |
Sembiring et al. (2011) | Predict final grade of students from the faculty of computer systems and software engineering at the University of Malaysia Pahang (UMP) in Malaysia | Demographics: personal beliefs, and family support Post-enrollment features: interest, study behaviour, and engaging time | Support Vector Machine (SVM) | Rapid-Miner | SVM produced high accuracy (83%) |
Yadav and Pal (2012) | Predict Engineering student academic performance at VBS Purvanchal University in Jaunpur, India | Demographics: gender, the medium of instruction, location, accommodation type, parents’ qualification, parents’ occupation, annual family income and etc. Pre-enrolment features: previous grades, and admission type | Decision trees (ID3, CART, and C4.5.) | Weka | C4.5 produced the best accuracy (67.77%) followed by ID3 and CART with the same accuracy (62.22%) |
Pal and Pal (2013) | Predict student performance at VBS Purvanchal University in Jaunpur, India | Demographics: gender, medium of instruction, college location, student accommodation, family size, family income, parents’ qualification, parents’ occupation, and category Pre-enrolment features: High School grade, senior secondary grade, admission type, and BCA result | Decision tree (ID3, ADT) Bagging | Weka | ID3 produced the best accuracy (78%) followed by bagging (73%) then ADT (69.50%) |
Kabakchieva (2013) | Predict students’ performance (Bad, average, good, very good and excellent) at the University of National and World Economy (UNWE) in Bulgaria | Demographics: gender, age, student speciality, and current semester Pre-enrolment features: place, and profile of the secondary school, secondary school GPA, admission exam score, and admission year Post-enrolment features: GPA achieved during the first year of university | Decision tree (J48) K-nearest neighbour Bayesian-rule induction (JRip, OneR) | Weka | J48 produced the best accuracy (66.5%) followed by JRip (63%) then KNN (60%) and Bayesian (≈ 60%) then finally OneR (54.5%) |
Abu Saa (2016) | Predict students’ performance in the IT Department at Ajman University of Science and Technology in Ajman, United Arab Emirates | Demographics: gender, nationality, first language, teaching language, living location, sponsorship, parent working in the university, student discounts, transportation method, family size, family income, parent’s marital status, parent’s qualifications, parent’s occupation, and number of friends Pre-enrolment features: high school percentage Post-enrollment features: previous semester GPA, number of credit hours, the average number of hours spent with friends per week | Decision trees (CART, C4.5., ID3, CHAID) Naïve Bayes | Rapid-Miner and Weka | CART produced best accuracy (40%) followed by C4.5 (36.40%) then Naïve Bayes (35.19%) then CHAID (34.07%) then finally ID3 (33.33%) |
Asif et al. (2017) | Predict students’ performance using 2 classes (low/high) at the end of the third year of their IT bachelor degree in Pakistan | Pre-enrolment features: total marks in High School Certificate, marks in mathematics, and sum of the marks in mathematics, physics and chemistry in an entrance examination Post-enrollment features: examination marks of all the courses taught in different academic years | Decision trees Random Forest Rule induction Naïve Bayes Neural NW K-nearest neighbour | Rapid-Miner | Naive Bayes produced best accuracy (83.65%) followed by 1-nearest neighbor (74.04%) then random forests (71.15%) then decision tree (69.23%) then neural NW (62.50%) then finally rule induction (55.77%) |
Yassein et al. (2017) | Predict students’ academic Performance in Saudi Arabia | Post-enrollment features: assignments, attendance, lab work attendance, final exam mark, mid-exam grades, education type, and success rate | Decision tree (C4.5) Two-step clusterin | SPSS & clementine | n/a |
4.4.2 Prediction of Students’ Failure or Drop Out of a Degree
-
Pradeep and Thomas (2015) Predicted bachelor student dropout using the records of 150 students who have been enrolled in a Technology program. Interestingly, the number of used features was reduced from 67 features to the best 13 using Attribute Selection Algorithms provided in WEKA tool. The selected features were mostly post-enrollment features such as attendance, taking notes from class, and some courses scores. Features such as age, gender and religion were neglected as they did not have an effect on the overall prediction.
-
Alemu Yehuala (2015) used 11,873 student records to predict university students who are at risk of failure. They found out that the 6 main features determining the failure or success of students are: number of students in a class, number of courses given in a semester, higher education, entrance certificate, examination result of a student, and gender.
-
Villwock et al. (2015) investigated the factors that may influence the students’ decision to drop out from a Mathematics Major. It was possible to observe that the courses that contributed to dropouts in the Major differ in different years. Considering only the subjects taken in the first year, the course that most contributed to dropouts was “Differential and Integral Calculus I”, and considering the first 2 years, it was “Finite Mathematics”. It was also concluded that the work factor is the feature that most contributed to the decision of dropping out. They believe that this is due to the fact that the working student has little time to devote to extracurricular study. They also found that marital status and age contributed to the decision of dropping out as well.
-
Daud et al. (2017) used 776 student instances to predict the completion or dropout of students from multiple universities in Pakistan. 23 features (selected by the feature extraction process) were chosen for the experiment. They concluded that the features that are most influential for predicting students’ performance are students’ natural gas expenditure, electricity expenditure, self-employment and location.
-
Aulck et al. (2017) used a dataset of over 32,500 students to predict student drop out in an Electrical Engineering department. Examining individual features revealed that the strongest features for the prediction of students’ drop out are GPA in math, English, chemistry, and psychology courses, year of enrollment, and birth year.
Author (s) | Prediction | Predictor features | DM Method (s) | DM tool | Result |
---|---|---|---|---|---|
Pradeep and Thomas (2015) | Predict bachelor student dropout in Technology programme at Mahatma Gandhi University (MG University) in Kerala, India | Pre-enrolment features: score obtained in Mathematics in 12th grade Post-enrollment features: level of boredom in classes, attendance, taking notes from class, scores in Basic Electrical Engineering, Basic Electronics Information Technology, Basic Mechanical Engineering, Engineering Graphics, Engineering Mechanics, Engineering Mechanics1, EC, and difficulty level in Engineering Mechanics1 | -Decision trees (ADT, J48, Random tree, REP tree) Rule induction (JRip, NNge, OneR, Ridor) | Weka | ADT obtained best accuracy (99.5%) followed by JRip (98.02%) then NNge and random tree with same accuracy (97.02%) then Ridor (96.53%) then REP Tree (95.05%) then J48 (94.55%) then finally OneR (89.60%). |
Alemu Yehuala (2015) | Predict university students at risk of failure at Debre Markos University in Ethiopia | Demographics: gender, age, region, identity, and socio-family past Pre-enrolment features: Higher Education, Entrance examination result Post-enrollment features: study field, college, semester GPA, no. of students in class, participation in optional activities, meeting with lecturers, views on academic context, teaching professors, registered courses, number of courses per semester, and total credit hours per semester | Decision tree (J48) Naïve Bayes | Weka | J48 produced better accuracy (91.62%-92.33%) than Naïve Bayes (86.3%-87.4%) |
Shakeel and Anwer Butt (2015) | Predict students who are likely to drop out and student needing further help in University of Gujrat (UOG) in Pakistan | Demographics: gender Pre-enrolment features: enrollment status Post-enrollment features: registered courses, matriculation marks, intermediate marks, sessional marks, midterm marks, final term objective marks, final term subjective marks, compulsory subject marks, and general subject marks | Decision tree (J48) Naive Bayes Random Forest Bayesian Logistic Regression | Weka | Naive Bayes produced the best accuracy (91.93%) then Random Forest (88.71%), then J48 (87.09%) then Bayesian Logistic Regression (66.13%) |
Villwock et al. (2015) | Identify courses contributing to student’s decision of dropping out the Mathematics Major at Universidade Estadual do Oeste do Paraná – UNIOESTE in Brazil | Demographics: family income, domestic budget, expenses with the university, residence, housing, and more Pre-enrolment features: result in the course of Differential and Integral Calculus I Post-enrollment features: information on the courses taken in the first two years of the Major | Decision tree (J48) | Weka | J48 produced high accuracy (91.84%) |
Daud et al. (2017) | Predict the completion or dropout of students from different universities in Pakistan | Demographics: gender, marital Status, house ownership, Scholarship, Self-employed, electricity bill, natural gas bill, telephone bill, water bill, food expenses, miscellaneous expenditure, medical, family expenditure on education, accommodation expenses, studying family members, dependent family member, family Income, and family Assets Pre-enrolment features: previous institution type, previous program | Support vector machine Bayes network Naïve Bayes Decision trees (C4.5, CART) | Weka | SVM produced best accuracy (86.7%) followed by Bayes network & Naïve Bayes with the same accuracy (84.8%) then C4.5 (76.6%) then finally CART (71%) |
Aulck et al. (2017) | Predict student drop out using the first semester’s grades in the Electrical Engineering department at the Eindhoven University of Technology in the USA | Demographics: gender, race, residency status, and birthdate Pre-enrolment features: previous schooling, (SAT and ACT scores, if available), transcript records Post-enrolment features: received grades, taken classes, and time at which they were taken | Logistic regression Random forests K-nearest neighbor | n/a | Logistic regression produced the best accuracy (66.59%) then k-nearest neighbor (64.60%) then random forests (62.24%) |
Kemper (2018) | Predict student dropout at KIT university in Karlsruhe, Germany | Demographics: gender, age, origin, and date of matriculation Post-enrollment features: study status, exam ID, exam grade, exam result, average grade in all exams, average grade in passed exams, average grade in failed exams, count of all exams, count of passed exams, and count of failed exams | Logistic regressions Decision trees | n/a | Decision trees to produce slightly better accuracy (91.3%) than Logistic regressions (90.08%) |
4.4.3 Prediction of Students’ Results on Particular Courses
-
Kovačić (2010) collected data from 453 students to predict their performance in an “Information Systems” course. They tried to find out whether the successful vs unsuccessful student can be distinguished in terms of demographic features (such as gender, age, ethnicity, and disability) or by study environment (such as course program, faculty or course block). Their results suggest that the information gathered during the enrolment process (demographics, secondary school, working status, and early enrolment) are not sufficient for an accurate distinction between successful and unsuccessful students.
-
Osmanbegović et al. (2012) used a dataset of 257 student records to predict their performance in a “Business Informatics” course. They performed an analysis to determine the importance of each feature individually. The results of their analysis revealed that the GPA impacts output the most, followed by entrance exam, then the study material, then the average weekly hours dedicated to studying. On the other hand, the number of household members, distance of residence from the faculty, and gender had the smallest output impact.
-
Huang and Fang (2013) used the data of 323 undergraduate students to predict their performance in a “Dynamics” course. One of their interesting findings is that the grades that students earn in pre-requisite courses might not truly reflect the knowledge of the students in those topics. This is due to the fact that they may have taken pre-requisite courses years ago, and by the time they take the dependent course, their knowledge in the pre-requisite courses may have improved.
-
Badr et al. (2016) used 203 students’ records to predict their performance in a “Programming” course. They analyzed the relationship between the programming course and the other courses and found out that only the English courses have a direct effect on the prediction.
-
Al luhaybi et al. (2018) collected data from 129 students to predict the students at high risk of failure in four computer science core modules. The predicted class feature is the “Overall Grade”, which is the final grade obtained by the student in the targeted module. The overall grade has five possible values A: Excellent, B: Very Good, C: Good, D: Acceptable, and F: Fail, which have been merged on to Low risk, Medium risk and High risk of failure to improve the classification results. A significant finding in their study was that student qualifications on the program entry have a high impact on their academic performance. They also found out that some of the final grades in previous modules are influencing the students’ academic results in the current modules.
Author (s) | Prediction | Predictor features | DM method (s) | DM Tool | Result |
---|---|---|---|---|---|
Kovačić (2010) | Predict successful and unsuccessful student in Information Systems course in New Zealand | Demographics: gender, age, ethnicity, disability, work status, early enrollment, Course program, and course block Pre-enrolment features: GPA of secondary school | Decision trees (CHAID, CART) | SPSS | CART produced better accuracy (60.5%) than CHAID (59.4%) |
Osmanbegović et al. (2012) | Predict students’ success in business informatics course in the Faculty of Economics, in Tuzla, Bosnia and Herzegovina | Demographics: gender, scholarship, marital status, income, distance, and no. of household members Pre-enrolment features: GPA, high school, entrance exam score Post-enrollment features: grade importance, studying time, and internet reach | Decision tree (J48) Naïve Bayes multilayer prediction | Weka | Naïve Bayes produced the best accuracy (76.65%) followed by J48 (73.93%) then multilayer prediction (71.2%) |
Huang and Fang (2013) | Predict student academic performance in Engineering Dynamics in Utah University in the USA | Pre-enrollment features: GPA, grades earned in four prerequisite courses: Engineering Statics, Calculus I, Calculus II, and Physics Post-enrollment features: scores earned in three Dynamics mid-exams | Multivariate linear regression Neural Networks Support Vector Machine | SPSS | The developed predictive models have an average prediction accuracy of 86.8–90.7% |
Badr et al. (2016) | Predicting students’ grades in a programming course for KSU mathematics department in Riyadh, Saudi Arabia | Post-enrollment features: Model 1: grades in 2 English courses and 2 math courses Model 2: grades in two English courses only | CBA rule-generation algorithm | LUCS-KDD | Model 2 produced better accuracy (67.33%) than model 1 (62.75%) |
Al luhaybi et al. (2018) | Predict 2nd-year computer science student academic performance in 4 computer science core courses at Brunel University in London, UK | Demographics: gender, age, and country Pre-enrolment features: previous institute, qualifications, enrollment status Post-enrollment features: program name, chosen route, study mode, first-year final grades, fees status, and model related data such as attendance and tutor | Decision tree (J48) Naïve Bayes | Weka and Java API | Naïve Bayes produced slightly better accuracy (78.79%) than J48 (77.3%) |