Skip to main content
Top

2021 | OriginalPaper | Chapter

4. Study of Big Data Analytics Tool: Apache Spark

Authors : Gend Lal Prajapati, Rachana Raghuwanshi

Published in: Big Data Analytics in Cognitive Social Media and Literary Texts

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter, we remark on machine learning and Big Data with their sample applications, process, and commonly used machine learning techniques like classification and clustering. These techniques are used to explore, evaluate, and leverage data. Also, tools and techniques that can be used to develop machine learning schemes to learn from data (or, Big Data) will be discussed. In addition to this, the role of distributed computing platforms like Apache Spark in applying machine learning to Big Data will be presented in detail. Apache Spark is a general-purpose cluster computing framework which works on the principle of distributed processing. It is open-source software used for fast computing. On receiving data, it can immediately process it. Apache Spark deals with historical data using batch processing and real-time processing. Machine learning is a subfield of Artificial Intelligence. Its main focus is on learning models that can be learned by experience (which is data in the case of machines). For example, a machine learning model can learn to recognize an image of a Dog by being shown lots and lots of images of Dogs. In this chapter, we assume that a reader has a basic understanding of Machine Learning. Ongoing through this book chapter, readers will be able to:
i.
Machine learning with Big Data, characteristics, sources, and applications are discussed.
 
ii.
Understand the comparative working of Apache Spark.
 
iii.
Analyze the various types of problems to identify suitable techniques.
 
iv.
Develop models using open-source tools like Skill Network Lab and IBM cloud.
 
v.
Explore problems of Big Data using machine learning techniques with Apache Spark.
 

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Amirghodsi, S., Hall, B., Rajendran, M., & Mei, S. (2017). Apache Spark 2.x machine learning cookbook. Packt Publishing Ltd. Amirghodsi, S., Hall, B., Rajendran, M., & Mei, S. (2017). Apache Spark 2.x machine learning cookbook. Packt Publishing Ltd.
go back to reference Ankam, V. (2016). Big Data analytics. Packt Publishing Ltd. Ankam, V. (2016). Big Data analytics. Packt Publishing Ltd.
go back to reference Ardagna, C., Bellandi, V., & Damiani, E. (2017) A model-driven methodology for Big Data analytics-as-a-service. In: Proceedings of the IEEE International Congress on Big Data, Honolulu, HI. Ardagna, C., Bellandi, V., & Damiani, E. (2017) A model-driven methodology for Big Data analytics-as-a-service. In: Proceedings of the IEEE International Congress on Big Data, Honolulu, HI.
go back to reference Bahga, A., & Madisetti, V. (2018). Big Data analytics: A hands-on approach. ISBN-10: 099602557X, ISBN-13: 978-0996025577. Bahga, A., & Madisetti, V. (2018). Big Data analytics: A hands-on approach. ISBN-10: 099602557X, ISBN-13: 978-0996025577.
go back to reference Bironneau, M., Coleman, T. (2019). Machine learning with go quick start guide: Hands-on techniques for building supervised and unsupervised machine learning Workflows. Packt Publishing Ltd. Bironneau, M., Coleman, T. (2019). Machine learning with go quick start guide: Hands-on techniques for building supervised and unsupervised machine learning Workflows. Packt Publishing Ltd.
go back to reference Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In: Proceedings of the ICDE (pp. 421–430). Heidelberg, New York: IEEE. Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In: Proceedings of the ICDE (pp. 421–430). Heidelberg, New York: IEEE.
go back to reference Bouali, F., Guettala, A. E., & Venturini, G. (2016). VizAssist an interactive user assistant for visual data mining. Visual Computer, 1447–1463. Bouali, F., Guettala, A. E., & Venturini, G. (2016). VizAssist an interactive user assistant for visual data mining. Visual Computer, 1447–1463.
go back to reference Chambers, B. (2018). Spark: The definitive guide: Big Data processing made simple. O'Reilly Media, Inc. Chambers, B. (2018). Spark: The definitive guide: Big Data processing made simple. O'Reilly Media, Inc.
go back to reference Chandarana, P., & Vijayalakshmi, M. (2014). Big Data analytics frameworks. In: International conference on circuits, systems, communication and information technology applications (CSCITA). INSPEC Accession Number: 14395170, Electronic ISBN: 978-1-4799-2494-3. Chandarana, P., & Vijayalakshmi, M. (2014). Big Data analytics frameworks. In: International conference on circuits, systems, communication and information technology applications (CSCITA). INSPEC Accession Number: 14395170, Electronic ISBN: 978-1-4799-2494-3.
go back to reference D'Arcy, A., Kelleher, J. D., & Namee, B. M. (2015) Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies. MIT Press. ISBN 0262029448, 9780262029445. D'Arcy, A., Kelleher, J. D., & Namee, B. M. (2015) Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies. MIT Press. ISBN 0262029448, 9780262029445.
go back to reference Dinesh Kumar, U., & Pradhan, M. (2019). Machine learning using python, Wiley. ISBN-10: 8126579900, ISBN-13: 978-8126579907. Dinesh Kumar, U., & Pradhan, M. (2019). Machine learning using python, Wiley. ISBN-10: 8126579900, ISBN-13: 978-8126579907.
go back to reference DT Editorial Services. (2015). Big Data, black book. Dreamtech Press. ASIN: B01LZEWQH6. DT Editorial Services. (2015). Big Data, black book. Dreamtech Press. ASIN: B01LZEWQH6.
go back to reference Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press. ISBN 1107096391, 9781107096394. Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press. ISBN 1107096391, 9781107096394.
go back to reference Geron, A. (2017) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc. Geron, A. (2017) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc.
go back to reference Golfarelli, M., Pirini, T., & Rizzi, S. (2017) Goal-based selection of visual representations for Big Data analytics. In: Proceedings of the MoBid, Valencia (pp. 47–57). Berlin: Springer. Golfarelli, M., Pirini, T., & Rizzi, S. (2017) Goal-based selection of visual representations for Big Data analytics. In: Proceedings of the MoBid, Valencia (pp. 47–57). Berlin: Springer.
go back to reference Grange, J. (2017). Machine learning for absolute beginners: A simple. Create Space Independent Publishing Platform. Grange, J. (2017). Machine learning for absolute beginners: A simple. Create Space Independent Publishing Platform.
go back to reference Grus, J. (2015). Data science from scratch. O'Reilly Media, Inc. ISBN 1491904399, 9781491904398. Grus, J. (2015). Data science from scratch. O'Reilly Media, Inc. ISBN 1491904399, 9781491904398.
go back to reference Luu, H. (2018). Beginning Apache Spark 2: With resilient distributed datasets, spark SQL, structured streaming and spark machine learning library, Apress Luu, H. (2018). Beginning Apache Spark 2: With resilient distributed datasets, spark SQL, structured streaming and spark machine learning library, Apress
go back to reference Ibrahim, I. A., Albarrak, A. M., & Li, X. (2017). Constrained recommendations for query visualizations. Knowledge and Information Systems, 499–529. Ibrahim, I. A., Albarrak, A. M., & Li, X. (2017). Constrained recommendations for query visualizations. Knowledge and Information Systems, 499–529.
go back to reference Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: Lightning-fast Big Data analysis. O'Reilly Media, Inc. Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: Lightning-fast Big Data analysis. O'Reilly Media, Inc.
go back to reference Marr, B. (2016). Big Data in practice: How 45 successful companies used Big Data analytics to deliver extraordinary results (1st edn.). Wiley. ASIN: B01DCOYDUS. Marr, B. (2016). Big Data in practice: How 45 successful companies used Big Data analytics to deliver extraordinary results (1st edn.). Wiley. ASIN: B01DCOYDUS.
go back to reference Marsland, S. (2014). Machine learning: An algorithmic perspective. CRC Press.CrossRef Marsland, S. (2014). Machine learning: An algorithmic perspective. CRC Press.CrossRef
go back to reference Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt
go back to reference Oprea, A., Li, Z., & Yen, T. (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: Proceedings of the DSN (pp. 45–56), Rio De Janeiro, Brazil. Oprea, A., Li, Z., & Yen, T. (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: Proceedings of the DSN (pp. 45–56), Rio De Janeiro, Brazil.
go back to reference Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.CrossRef Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.CrossRef
go back to reference Thottuvaikkatumana, R. (2016). Apache Spark 2 for beginners. Packt Publishing Ltd. Thottuvaikkatumana, R. (2016). Apache Spark 2 for beginners. Packt Publishing Ltd.
go back to reference Walkowiak, S. (2016). Big Data analytics with R. Packt Publishing Ltd. Walkowiak, S. (2016). Big Data analytics with R. Packt Publishing Ltd.
go back to reference Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data (1st edn.). Shroff/O'Reilly. Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data (1st edn.). Shroff/O'Reilly.
go back to reference Wills. J., Ryza, S., Laserson, U., & Owen, S. (2009) Advanced analytics with spark: patterns for learning from data at scale. O’Reilly. Wills. J., Ryza, S., Laserson, U., & Owen, S. (2009) Advanced analytics with spark: patterns for learning from data at scale. O’Reilly.
go back to reference Wongsuphasawat, K., Moritz, D., Anand, A. (2016). Towards a general-purpose query language for visualization recommendation. In: Proceedings of the HILDA (p. 4), San Francisco, CA. Wongsuphasawat, K., Moritz, D., Anand, A. (2016). Towards a general-purpose query language for visualization recommendation. In: Proceedings of the HILDA (p. 4), San Francisco, CA.
Metadata
Title
Study of Big Data Analytics Tool: Apache Spark
Authors
Gend Lal Prajapati
Rachana Raghuwanshi
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-4729-1_4

Premium Partner