Skip to main content

Machine Learning Algorithms to Predict Potential Dropout in High School

  • Conference paper
  • First Online:
Data Analytics and Management

Abstract

In a developing country like India, the growth of its citizens and consequently the advancement of the nation depend on the education provided to them. However, the process of delivering education has been hindered by considerable dropout rates which have multiple social and economic consequences. Hence, it is crucial to find out ways to overcome this problem. The advent of machine learning and the availability of an immense amount of data have enabled the development of data science and consequently, its application in educational institutions. Educational data mining enables the educator/teacher to monitor student requirement and provides the necessary response and counselling. In this paper, we use advance machine learning algorithms like logistic regression, decision trees and K-nearest neighbours to predict whether a student will drop out or continue his/her education. The accuracy of such models is calculated and studied. On the basis of the results, it was found that ML techniques prove to be useful in this domain with random forest being the most accurate classifier for predicting dropout rate. Educational institutions can analyse which students may need more attention using this research as it is base, thus modifying teaching methods to achieve the end goal of 0% dropout rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. IEEE Trans Syst Man Cybern Part C Appl Rev 40(6):601–618. https://doi.org/10.1109/TSMCC.2010.2053532

    Article  Google Scholar 

  2. Sateesh M, Sekher TV (2014) Factors leading to school dropouts in India: an analysis of national family health survey-3 data. Int J Res Method Educ 4:75–83.https://doi.org/10.9790/7388-04637583

  3. Kominski R (1990) Estimating the national high school dropout rate. Demography 27:303–311. https://doi.org/10.2307/2061455

    Article  Google Scholar 

  4. McCaul EJ, Donaldson GA, Coladarci T, Davis WE (1992) J Educ Res 85(4): 198–207

    Google Scholar 

  5. Langley P, Simon HA (1995) Applications of machine learning and rule induction. Commun. ACM 38(11):54–64

    Google Scholar 

  6. Kotsiantis SB, Pierrakeas CJ, Pintelas PE (2003) Preventing student dropout in distance learning using machine learning techniques. In: Palade V, Howlett RJ, Jain L (eds) Knowledge-based intelligent information and engineering systems. KES 2003. In: Lecture notes in computer science, vol 2774. Springer, Berlin, Heidelberg

    Google Scholar 

  7. Yukselturk E, Ozekes S, Türel Y (2014) Predicting dropout student: an application of data mining methods in an online education program. Euro J Open Distance E-Learn 17:118–133. https://doi.org/10.2478/eurodl-2014-0008

    Article  Google Scholar 

  8. Aulck LS, Nishant V, Blumenstock JE, West J (2016) Predicting student dropout in higher education. https://ArXiv.org/abs/1606.06364

  9. Suh S, Suh J, Houston I (2007) vol 85(2): 131–255. Spring 2007. https://doi.org/10.1002/j.1556-6678.2007.tb00463.x

  10. Kleinbaum DG, Klein M (2010) Logistic regression: a self-learning text. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1742-3

  11. Scott M (2001) Applied logistic regression analysis, 2nd edn. SAGE, pp 1–33. https://books.google.co.in/books?id=JbVIDwAAQBAJ

  12. Soucy P, Mineau GW (2001) A simple KNN algorithm for text categorization. In: Proceedings 2001 IEEE international conference on data mining. San Jose, CA, USA, pp 647–648

    Google Scholar 

  13. Yigit H (2013) A-weighting approach for KNN classifier. In: 2013 international conference on electronics, computer and computation (ICECCO), Ankara, pp 228–231

    Google Scholar 

  14. Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2:18–22

    Google Scholar 

  15. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Article  Google Scholar 

  16. Kloft M, Stiehler F, Zheng Z, Pinkwart N (2014) Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 25–29 Oct 2014. Association for Computational Linguistics, Doha, Qatar, pp 60–65

    Google Scholar 

  17. de Santos KJ, Menezes AG, de Carvalho AB, Montesco CAE (2019) Supervised learning in the context of educational data mining to avoid university students dropout. In: 2019 IEEE 19th international conference on advanced learning technologies (ICALT). Maceió, Brazil, pp 207–208

    Google Scholar 

  18. Rumberger RW (2001) Why students drop out of school and what can be done. UCLA: the civil rights project/Proyecto Derechos civiles. Retrieved from https://escholarship.org/uc/item/58p2c3wp

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rishabh Jain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Makhloga, V.S., Raheja, K., Jain, R., Bhattacharya, O. (2021). Machine Learning Algorithms to Predict Potential Dropout in High School. In: Khanna, A., Gupta, D., Pólkowski, Z., Bhattacharyya, S., Castillo, O. (eds) Data Analytics and Management. Lecture Notes on Data Engineering and Communications Technologies, vol 54. Springer, Singapore. https://doi.org/10.1007/978-981-15-8335-3_17

Download citation

Publish with us

Policies and ethics