Skip to main content

2019 | OriginalPaper | Buchkapitel

Processing Using Spark—A Potent of BD Technology

verfasst von : M. Venkatesh Saravanakumar, Sabibullah Mohamed Hanifa

Erschienen in: Big Data Processing Using Spark in Cloud

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Processing, accessing, analyzing, securing, and stockpiling of big data are the most core modalities in big data technology, where Spark, is a core processing layer, an open-source cluster (in-memory) computing platform, unified data processing engine, faster and reliable in a cutting-edge analysis for all types of data. It has a potent to join different datasets across multiple disparate data sources. It supports in-memory computing and enables faster query access compared to disk-based engines like Hadoop. Query ID="Q1" Text="Please check and confirm if the author names and initials are correct." This chapter sustains the major potent of processing behind Spark connected contents like Resilient Distributed Datasets (RDDs), scalable Machine Learning libraries (MLlib), Spark incremental Streaming pipeline process, parallel graph computation interface through GraphX, SQL Data frames, SparkSQL (Data processing paradigm supports columnar storage), and Recommendation systems with MlLib. All libraries operate on RDDs as the data abstraction is very easy to compose with any applications. RDDs are a fault-tolerant computing engine (RDDs are the major abstraction and provide explicit support for data-sharing (user’s computations), can capture a wide range of processing workloads and parallel manipulated can be done in the cluster as a fault-tolerant manner). These are exposed through functional programming APIs (or BD-supported languages) like Scala, Python. Chapter also throws the viewpoint on core scalability of Spark to build high-level data processing libraries for the future generation application is involved. To understand and simplify the entire BD tasks, focusing of processing hindsight, insights, foresights by using Spark’s core engine, its members of ecosystem components are explained with a neat interpretable way, is mandatory for data science compilers at this moment. Big contents dive (current big data tools in Spark, cloud storage) of cognizance are explored in this initiative to replace the bottlenecks towards the development of an efficient and comprehend analytics applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ankam, V.: Big Data Analytics. Packt Publishing (2016). ISBN 978-1-78588-469-6 Ankam, V.: Big Data Analytics. Packt Publishing (2016). ISBN 978-1-78588-469-6
3.
Zurück zum Zitat Ambrusty, M., et al.: Spark SQL: Relational Data Processing in Spark. AMP, UC Berkrley, (2015) Ambrusty, M., et al.: Spark SQL: Relational Data Processing in Spark. AMP, UC Berkrley, (2015)
4.
Zurück zum Zitat Lu, X., et al.: Accelerating spark with RDMA for big data processing: early experiences. In: 22nd annual Symbosium on High-Performance Interconnects. IEEE (2014) Lu, X., et al.: Accelerating spark with RDMA for big data processing: early experiences. In: 22nd annual Symbosium on High-Performance Interconnects. IEEE (2014)
5.
Zurück zum Zitat Zaharia, M., et al.: Spark: Cluster Computing with Working Sets, Hot Cloud (2010) Zaharia, M., et al.: Spark: Cluster Computing with Working Sets, Hot Cloud (2010)
6.
Zurück zum Zitat Zaharia, M., et al.: Resilient distributed datasets: a fault—tolerant abstraction for in-memory cluster computing. In: NSDI’12 USENIX Symposium on networked design and implementation with ACM SIGOCOMM and ACM SIGOPS, SAN-JOSE,CA (2012) Zaharia, M., et al.: Resilient distributed datasets: a fault—tolerant abstraction for in-memory cluster computing. In: NSDI’12 USENIX Symposium on networked design and implementation with ACM SIGOCOMM and ACM SIGOPS, SAN-JOSE,CA (2012)
7.
Zurück zum Zitat Hindman, B., et al.: Mesos: A Platform for fine-grained resource sharing in the data center, Technical report UCB/EECS-2010-87, EECS Department, University of California, Berkely, May 2010 Hindman, B., et al.: Mesos: A Platform for fine-grained resource sharing in the data center, Technical report UCB/EECS-2010-87, EECS Department, University of California, Berkely, May 2010
8.
Zurück zum Zitat Fu, J., et al.: SPARK—a big data processing platform for machine learning. In: International Conference on Industrial Informatics—Computing Technology, Intelligent Technology, Industrial Information, Integration. pp. 48–51. IEEE (2016) Fu, J., et al.: SPARK—a big data processing platform for machine learning. In: International Conference on Industrial Informatics—Computing Technology, Intelligent Technology, Industrial Information, Integration. pp. 48–51. IEEE (2016)
9.
Zurück zum Zitat Dhanapal, A., Saravanakumar M.V., Sabibullah. M.: Emerging big data storage architectures: a new paradigm. i-Manag. J. Pattern Recogn. 4(2), 31–41 (2017) Dhanapal, A., Saravanakumar M.V., Sabibullah. M.: Emerging big data storage architectures: a new paradigm. i-Manag. J. Pattern Recogn. 4(2), 31–41 (2017)
10.
Zurück zum Zitat Raja, K., Sabibullah, M.: Big data driven cloud security—a survey. In: IOP Conference Series, Materials Science & Engineering (ICMAEM-2017), vol. 225 (2017) Raja, K., Sabibullah, M.: Big data driven cloud security—a survey. In: IOP Conference Series, Materials Science & Engineering (ICMAEM-2017), vol. 225 (2017)
11.
Zurück zum Zitat Arulananthan, C., Sabibullah, M.: Smart Health- Potential & Pathways -A Survey, Vol. 225, IOP Conference Series, Materials Science & Engineering (ICMAEM-2017) (2017) Arulananthan, C., Sabibullah, M.: Smart Health- Potential & Pathways -A Survey, Vol. 225, IOP Conference Series, Materials Science & Engineering (ICMAEM-2017) (2017)
12.
Zurück zum Zitat Ghaffar, A., et al.: Big data analysis: an spark perspective. Glob. J. Comput. Sci. Technol. Softw. Data Eng. Version 1.0 15(1) (2015) Ghaffar, A., et al.: Big data analysis: an spark perspective. Glob. J. Comput. Sci. Technol. Softw. Data Eng. Version 1.0 15(1) (2015)
Metadaten
Titel
Processing Using Spark—A Potent of BD Technology
verfasst von
M. Venkatesh Saravanakumar
Sabibullah Mohamed Hanifa
Copyright-Jahr
2019
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-0550-4_9