Skip to main content
Top

2018 | OriginalPaper | Chapter

Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark

Authors : Garima Mogha, Khyati Ahlawat, Amit Prakash Singh

Published in: Data Science and Analytics

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Applying Intelligence to the machines is a need in today’s world and this need leads to the evolution of machine learning. The analysis of data using machine learning algorithms is a trending research area and this analysis lead to some problems when the data comes out to be big data. This paper compares various classification based machine learning algorithms namely, Decision Tree Learning, Naïve Bayes, Random Forest and Support Vector Machines on big data using Apache Spark. The accuracy is evaluated to find out which classification based algorithm gives fast and better result.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using apache spark. Procedia Comput. Sci. 93, 824–831 (2016)CrossRef Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using apache spark. Procedia Comput. Sci. 93, 824–831 (2016)CrossRef
2.
go back to reference Shyam, R., Bharathi Ganesh, H.B., Kumar, S., Poornachandran, P., Soman, K.P.: Apache spark a big data analytics platform for smart grid. Procedia Technol. 21, 171–178 (2015)CrossRef Shyam, R., Bharathi Ganesh, H.B., Kumar, S., Poornachandran, P., Soman, K.P.: Apache spark a big data analytics platform for smart grid. Procedia Technol. 21, 171–178 (2015)CrossRef
3.
go back to reference Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., Chouvarda, I.: Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116 (2017)CrossRef Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., Chouvarda, I.: Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116 (2017)CrossRef
4.
go back to reference Kumar, D., Singh, R., Kumar, A., Sharma, N.: An adaptive method of PCA for minimization of classification error using Naïve Bayes classifier. Procedia Comput. Sci. 70, 9–15 (2015)CrossRef Kumar, D., Singh, R., Kumar, A., Sharma, N.: An adaptive method of PCA for minimization of classification error using Naïve Bayes classifier. Procedia Comput. Sci. 70, 9–15 (2015)CrossRef
5.
go back to reference Zhang, P., Wu, X., Wang, X., Bi, S.: Short-term load forecasting based on big data technologies. CSEE J. Power Energy Syst. 1(3), 59–67 (2015)CrossRef Zhang, P., Wu, X., Wang, X., Bi, S.: Short-term load forecasting based on big data technologies. CSEE J. Power Energy Syst. 1(3), 59–67 (2015)CrossRef
6.
go back to reference Liu, S., Wang, X., Liu, M., Zhu, J.: Towards better analysis of machine learning models: a visual analytics perspective. Vis. Inf. 1(1), 48–56 (2017) Liu, S., Wang, X., Liu, M., Zhu, J.: Towards better analysis of machine learning models: a visual analytics perspective. Vis. Inf. 1(1), 48–56 (2017)
7.
go back to reference Panigrahi, S., Lenka, R.K., Stitipragyan, A.: A hybrid distributed collaborative filtering recommender engine using apache spark. Procedia Comput. Sci. 83, 1000–1006 (2016)CrossRef Panigrahi, S., Lenka, R.K., Stitipragyan, A.: A hybrid distributed collaborative filtering recommender engine using apache spark. Procedia Comput. Sci. 83, 1000–1006 (2016)CrossRef
8.
go back to reference Alpaydin, E.: Introduction to Machine Learning, 3rd edn. The MIT Press, Cambridge, London (2014)MATH Alpaydin, E.: Introduction to Machine Learning, 3rd edn. The MIT Press, Cambridge, London (2014)MATH
10.
go back to reference Kelleher, J.D., Mac Namee, B., D’Arcy, A.: Fundamentals of Machine Learning for Predictive Data Analytics. The MIT Press, Cambridge, London (2015)MATH Kelleher, J.D., Mac Namee, B., D’Arcy, A.: Fundamentals of Machine Learning for Predictive Data Analytics. The MIT Press, Cambridge, London (2015)MATH
11.
go back to reference Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufman Publishers, Burlington (2011)MATH Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufman Publishers, Burlington (2011)MATH
12.
go back to reference Mitchell, T.M.: Machine Learning. Mcgraw Hill Education Private Limited, New York (1997)MATH Mitchell, T.M.: Machine Learning. Mcgraw Hill Education Private Limited, New York (1997)MATH
13.
go back to reference Scott, J.A.: Getting Started with Apache Spark: Inception to Production, 1st edn. MapR Technologies, San Jose (2015) Scott, J.A.: Getting Started with Apache Spark: Inception to Production, 1st edn. MapR Technologies, San Jose (2015)
14.
go back to reference Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)CrossRef Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)CrossRef
15.
go back to reference Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on Hadoop vs MPI/OpenMP on Beowulf. Procedia Comput. Sci. 53, 121–130 (2015)CrossRef Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on Hadoop vs MPI/OpenMP on Beowulf. Procedia Comput. Sci. 53, 121–130 (2015)CrossRef
16.
go back to reference Shafique, M.A., Hato, E.: Classification of travel data with multiple sensor information using random forest. Transp. Res. Procedia 22, 144–153 (2017)CrossRef Shafique, M.A., Hato, E.: Classification of travel data with multiple sensor information using random forest. Transp. Res. Procedia 22, 144–153 (2017)CrossRef
17.
go back to reference Swetapadma, A., Yadav, A.: Protection of parallel transmission lines including inter-circuit faults using Naïve Bayes classifier. Alexandria Eng. J. 55(2), 1411–1419 (2016)CrossRef Swetapadma, A., Yadav, A.: Protection of parallel transmission lines including inter-circuit faults using Naïve Bayes classifier. Alexandria Eng. J. 55(2), 1411–1419 (2016)CrossRef
18.
go back to reference Jayasree, V., Balan, R.S.: Money laundering regulatory risk evaluation using bitmap index-based decision tree. J. Assoc. Arab Univ. Basic Appl. Sci. 23, 96–102 (2017) Jayasree, V., Balan, R.S.: Money laundering regulatory risk evaluation using bitmap index-based decision tree. J. Assoc. Arab Univ. Basic Appl. Sci. 23, 96–102 (2017)
19.
go back to reference Götz, M., Richerzhagen, M., Bodenstein, C., Cavallaro, G., Glock, P., Riedel, M., Benediktsson, J.A.: On scalable data mining techniques for earth science. Procedia Comput. Sci. 51, 2188–2197 (2015)CrossRef Götz, M., Richerzhagen, M., Bodenstein, C., Cavallaro, G., Glock, P., Riedel, M., Benediktsson, J.A.: On scalable data mining techniques for earth science. Procedia Comput. Sci. 51, 2188–2197 (2015)CrossRef
Metadata
Title
Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
Authors
Garima Mogha
Khyati Ahlawat
Amit Prakash Singh
Copyright Year
2018
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8527-7_2

Premium Partner