Skip to main content
Erschienen in: The Journal of Supercomputing 9/2017

25.02.2017

Investigating Apache Hama: a bulk synchronous parallel computing framework

verfasst von: Kamran Siddique, Zahid Akhtar, Yangwoo Kim, Young-Sik Jeong, Edward J. Yoon

Erschienen in: The Journal of Supercomputing | Ausgabe 9/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The quantity of digital data is growing exponentially, and the task to efficiently process such massive data is becoming increasingly challenging. Recently, academia and industry have recognized the limitations of the predominate Hadoop framework in several application domains, such as complex algorithmic computation, graph, and streaming data. Unfortunately, this widely known map-shuffle-reduce paradigm has become a bottleneck to address the challenges of big data trends. The demand for research and development of novel massive computing frameworks is increasing rapidly, and systematic illustration, analysis, and highlights of potential research areas are vital and very much in demand by the researchers in the field. Therefore, we explore one of the emerging and promising distributed computing frameworks, Apache Hama. This is a top level project under the Apache Software Foundation and a pure bulk synchronous parallel model for processing massive scientific computations, e.g. graph, matrix, and network algorithms. The objectives of this contribution are twofold. First, we outline the current state of the art, distinguish the challenges, and frame some research directions for researchers and application developers. Second, we present real-world use cases of Apache Hama to illustrate its potential specifically to the industrial community.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Yu N, Yu Z, Li B, Gu F, Pan Y (2016) A comprehensive review of emerging computational methods for gene identification. J Inf Process Syst 12(1):1–34. doi:10.3745/JIPS.04.0023 Yu N, Yu Z, Li B, Gu F, Pan Y (2016) A comprehensive review of emerging computational methods for gene identification. J Inf Process Syst 12(1):1–34. doi:10.​3745/​JIPS.​04.​0023
4.
6.
Zurück zum Zitat Kalavri V, Vlassov V (2013) MapReduce limitations, optimizations and open issues. In: The IEEE 12th International Conference on Trust, Security and Privacy in Computing and Communications, pp 1031–1038 Kalavri V, Vlassov V (2013) MapReduce limitations, optimizations and open issues. In: The IEEE 12th International Conference on Trust, Security and Privacy in Computing and Communications, pp 1031–1038
9.
Zurück zum Zitat Elser B, Montresor A (2013) An evaluation study of BigData frameworks for graph processing. In: IEEE Big Data pp 60–67 Elser B, Montresor A (2013) An evaluation study of BigData frameworks for graph processing. In: IEEE Big Data pp 60–67
14.
Zurück zum Zitat Zhang X, Wang R, Chen X, Wang J, Lukasiewicz T, Han D (2015) Achieving up to zero communication delay in BSP based graph processing via vertex categorization. In: International Conference on Networking, Architecture, and Storage, IEEE, Boston, pp 112–121. doi:10.1109/NAS.2015.7255213 Zhang X, Wang R, Chen X, Wang J, Lukasiewicz T, Han D (2015) Achieving up to zero communication delay in BSP based graph processing via vertex categorization. In: International Conference on Networking, Architecture, and Storage, IEEE, Boston, pp 112–121. doi:10.​1109/​NAS.​2015.​7255213
15.
Zurück zum Zitat Ratnaparkhi AA, Pilli E, Joshi RC (2015) Scaling GMM expectation maximization algorithm using bulk synchronous parallel approach. In: International Conference on Green Computing and Internet of Things, IEEE, Noida, pp 558–562. doi:10.1109/ICGCIoT.2015.7380527 Ratnaparkhi AA, Pilli E, Joshi RC (2015) Scaling GMM expectation maximization algorithm using bulk synchronous parallel approach. In: International Conference on Green Computing and Internet of Things, IEEE, Noida, pp 558–562. doi:10.​1109/​ICGCIoT.​2015.​7380527
16.
Zurück zum Zitat Zhou W, Han J, Gao Y, Xu Z (2016) An efficient graph data processing system for large-scale social network service applications. Concurr Comput 28(3):729–747. doi:10.1002/cpe.3393 CrossRef Zhou W, Han J, Gao Y, Xu Z (2016) An efficient graph data processing system for large-scale social network service applications. Concurr Comput 28(3):729–747. doi:10.​1002/​cpe.​3393 CrossRef
17.
Zurück zum Zitat Luo S, Liu L, Wang H, Wu B, Liu Y (2014) Implementation of a parallel graph partitioning algorithm to speed up BSP computing. In: The 11th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, China, pp 740–744 Luo S, Liu L, Wang H, Wu B, Liu Y (2014) Implementation of a parallel graph partitioning algorithm to speed up BSP computing. In: The 11th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, China, pp 740–744
18.
Zurück zum Zitat Chen R, Ding X, Wang P, Chen H, Zang B, Guan H (2014) Computation and communication efficient graph processing with distributed immutable view. In: The 23rd International ACM Symposium on High Performance Parallel and Distributed Computing. Vancouver, Canada, pp 215–226 Chen R, Ding X, Wang P, Chen H, Zang B, Guan H (2014) Computation and communication efficient graph processing with distributed immutable view. In: The 23rd International ACM Symposium on High Performance Parallel and Distributed Computing. Vancouver, Canada, pp 215–226
19.
Zurück zum Zitat McColl R, Ediger D, Poovey J, Campbell D, Bader DA (2014) A performance evaluation of open source graph databases. In: The Proceedings of the First Workshop on Parallel Programming for Analytics Applications. Orlando, Florida, pp 11–17 McColl R, Ediger D, Poovey J, Campbell D, Bader DA (2014) A performance evaluation of open source graph databases. In: The Proceedings of the First Workshop on Parallel Programming for Analytics Applications. Orlando, Florida, pp 11–17
20.
Zurück zum Zitat Wang Z, Bao Y, Gu Y, Leng F, Yu G, Deng C, Guo L (2013) A BSP based parallel iterative processing system with multiple partition strategies for big graphs. In: IEEE International Congress on Big Data, CA, pp 173–180 Wang Z, Bao Y, Gu Y, Leng F, Yu G, Deng C, Guo L (2013) A BSP based parallel iterative processing system with multiple partition strategies for big graphs. In: IEEE International Congress on Big Data, CA, pp 173–180
21.
Zurück zum Zitat Ho LY, Li TH, Wu JJ, Liu P (2013) Kylin: an efficient and scalable graph data processing system. In: IEEE International Conference on Big Data, CA, USA, pp 193–198 Ho LY, Li TH, Wu JJ, Liu P (2013) Kylin: an efficient and scalable graph data processing system. In: IEEE International Conference on Big Data, CA, USA, pp 193–198
22.
Zurück zum Zitat Khayyat Z, Awaraz K, Alonaziz A, Jamjoomy H, Williamsy D, Kalnis P (2013) Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European Conference on Computer Systems. Czech Republic, Prague, pp 169–182 Khayyat Z, Awaraz K, Alonaziz A, Jamjoomy H, Williamsy D, Kalnis P (2013) Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the 8th ACM European Conference on Computer Systems. Czech Republic, Prague, pp 169–182
23.
Zurück zum Zitat Zhang J, Ge S (2012) A parallel algorithm to find overlapping community structure in directed and weighted complex networks. In: 2nd International Conference on Instrumentation and Measurement, Computer, Communication and Control, IEEE, Harbin City, Heilongjiang, China, pp 1561–1564. doi:10.1109/IMCCC.2012.364 Zhang J, Ge S (2012) A parallel algorithm to find overlapping community structure in directed and weighted complex networks. In: 2nd International Conference on Instrumentation and Measurement, Computer, Communication and Control, IEEE, Harbin City, Heilongjiang, China, pp 1561–1564. doi:10.​1109/​IMCCC.​2012.​364
24.
Zurück zum Zitat Chen R, Weng X, He B, Yang M, Choi B, Li X (2012) Improving large graph processing on partitioned graphs in the cloud. In: ACM Symposium on Cloud Computing, San Jose, CA. doi:10.1145/2391229.2391232 Chen R, Weng X, He B, Yang M, Choi B, Li X (2012) Improving large graph processing on partitioned graphs in the cloud. In: ACM Symposium on Cloud Computing, San Jose, CA. doi:10.​1145/​2391229.​2391232
25.
Zurück zum Zitat Ting IH, Lin CH, Wang CS (2011) Constructing a cloud computing based social networks data warehousing and analyzing system. In: International Conference on Advances in Social Networks Analysis and Mining. IEEE, Kaohsiung, Taiwan, pp 735–740 Ting IH, Lin CH, Wang CS (2011) Constructing a cloud computing based social networks data warehousing and analyzing system. In: International Conference on Advances in Social Networks Analysis and Mining. IEEE, Kaohsiung, Taiwan, pp 735–740
26.
Zurück zum Zitat Seo S, Yoon EJ, Kim J, Jin S, Kim JS, Maeng S (2010) HAMA: an efficient matrix computation with the MapReduce framework. In: Proceedings of the IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom). Greece, Athens, pp 721–726 Seo S, Yoon EJ, Kim J, Jin S, Kim JS, Maeng S (2010) HAMA: an efficient matrix computation with the MapReduce framework. In: Proceedings of the IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom). Greece, Athens, pp 721–726
27.
Zurück zum Zitat Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRef Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRef
32.
Zurück zum Zitat Golghate AA, Shende SW (2014) Parallel K-means clustering based on hadoop and hama. Int J Comput Technol 1(3):33–37 Golghate AA, Shende SW (2014) Parallel K-means clustering based on hadoop and hama. Int J Comput Technol 1(3):33–37
33.
Zurück zum Zitat Li S, Xu B (2015) Performance comparison between hama and hadoop. Int J Database Theory Appl 8(3):77–84CrossRef Li S, Xu B (2015) Performance comparison between hama and hadoop. Int J Database Theory Appl 8(3):77–84CrossRef
34.
Zurück zum Zitat Jin S, Yang S, Jia Y (2012) Optimization of task assignment strategy for map-reduce. In: 2\(^{nd}\) International Conference on Computer Science and Network Technology. Changchun, China, pp 57–61 Jin S, Yang S, Jia Y (2012) Optimization of task assignment strategy for map-reduce. In: 2\(^{nd}\) International Conference on Computer Science and Network Technology. Changchun, China, pp 57–61
Metadaten
Titel
Investigating Apache Hama: a bulk synchronous parallel computing framework
verfasst von
Kamran Siddique
Zahid Akhtar
Yangwoo Kim
Young-Sik Jeong
Edward J. Yoon
Publikationsdatum
25.02.2017
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 9/2017
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-1987-9

Weitere Artikel der Ausgabe 9/2017

The Journal of Supercomputing 9/2017 Zur Ausgabe

Premium Partner