Skip to main content
Erschienen in: Telecommunication Systems 1/2019

26.04.2018

Designing a Hadoop system based on computational resources and network delay for wide area networks

verfasst von: Tomohiro Matsuno, Bijoy Chand Chatterjee, Nattapong Kitsuwan, Eiji Oki, Malathi Veeraraghavan, Satoru Okamoto, Naoaki Yamanaka

Erschienen in: Telecommunication Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a Hadoop system that considers both slave server’s processing capacity and network delay for wide area networks to reduce the job processing time. The task allocation scheme in the proposed Hadoop system divides each individual job into multiple tasks using suitable splitting ratios and then allocates the tasks to different slaves according to the computational capability of each server and the availability of network resources. We incorporate software-defined networking to the proposed Hadoop system to manage path computation elements and network resources. The performance of proposed Hadoop system is experimentally evaluated with fourteen machines located in the different parts of the globe using a scale-out approach. A scale-out experiment using the proposed and conventional Hadoop systems is conducted by executing both single job and multiple jobs. The practical testbed and simulation results indicate that the proposed Hadoop system is effective compared to the conventional Hadoop system in terms of processing time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Manikandan, S., & Ravi, S. (2014). Big data analysis using apache hadoop. In International conference on IT convergence and security (ICITCS) (pp. 1–4). Manikandan, S., & Ravi, S. (2014). Big data analysis using apache hadoop. In International conference on IT convergence and security (ICITCS) (pp. 1–4).
2.
Zurück zum Zitat Dong, F., & Akl, S. G. (2006). Scheduling algorithms for grid computing: State of the art and open problems. Report: Technical. Dong, F., & Akl, S. G. (2006). Scheduling algorithms for grid computing: State of the art and open problems. Report: Technical.
4.
Zurück zum Zitat Adnan M., Afzal M., Aslam M., Jan R., & Martinez-Enriquez A. (2014). Minimizing big data problems using cloud computing based on hadoop architecture. In 11th annual high-capacity optical networks and emerging/enabling technologies (HONET) (pp. 99–103). Adnan M., Afzal M., Aslam M., Jan R., & Martinez-Enriquez A. (2014). Minimizing big data problems using cloud computing based on hadoop architecture. In 11th annual high-capacity optical networks and emerging/enabling technologies (HONET) (pp. 99–103).
7.
Zurück zum Zitat White, T. (2012). Hadoop: The definitive guide (3rd ed.). Newton: O’Reilly Media Inc. White, T. (2012). Hadoop: The definitive guide (3rd ed.). Newton: O’Reilly Media Inc.
8.
Zurück zum Zitat Martin, B. (2014). SARAH-statistical analysis for resource allocation in hadoop. In IEEE 13th international conference on trust, security and privacy in computing and communications (TrustCom) (pp. 777–782). Martin, B. (2014). SARAH-statistical analysis for resource allocation in hadoop. In IEEE 13th international conference on trust, security and privacy in computing and communications (TrustCom) (pp. 777–782).
9.
Zurück zum Zitat Chen, D., Chen, Y., Brownlow, B. N., Kanjamala, P. P., Arredondo, C. A. G., Radspinner, B. L., et al. (2017). Real-time or near real-time persisting daily healthcare data into HDFS and elasticsearch index inside a big data platform. IEEE Transactions on Industrial Informatics, 13(2), 595–606. https://doi.org/10.1109/TII.2016.2645606.CrossRef Chen, D., Chen, Y., Brownlow, B. N., Kanjamala, P. P., Arredondo, C. A. G., Radspinner, B. L., et al. (2017). Real-time or near real-time persisting daily healthcare data into HDFS and elasticsearch index inside a big data platform. IEEE Transactions on Industrial Informatics, 13(2), 595–606. https://​doi.​org/​10.​1109/​TII.​2016.​2645606.CrossRef
12.
Zurück zum Zitat Jung, H., & Nakazato, H. (2014). Dynamic scheduling for speculative execution to improve MapReduce performance in heterogeneous environment. In IEEE 34th international conference on distributed computing systems workshops (ICDCSW) (pp. 119–124). Jung, H., & Nakazato, H. (2014). Dynamic scheduling for speculative execution to improve MapReduce performance in heterogeneous environment. In IEEE 34th international conference on distributed computing systems workshops (ICDCSW) (pp. 119–124).
13.
Zurück zum Zitat Hsiao, J. & Kao, S. (2014). A usage-aware scheduler for improving MapReduce performance in heterogeneous environments. In International conference on information science, electronics and electrical engineering (ISEEE) (pp. 1648–1652). Hsiao, J. & Kao, S. (2014). A usage-aware scheduler for improving MapReduce performance in heterogeneous environments. In International conference on information science, electronics and electrical engineering (ISEEE) (pp. 1648–1652).
16.
Zurück zum Zitat Yao, Y., Wang, J., Sheng, B., Lin, J., & Mi, N. (2014). HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In IEEE 7th international conference on cloud computing (CLOUD) (pp. 184–191). Yao, Y., Wang, J., Sheng, B., Lin, J., & Mi, N. (2014). HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In IEEE 7th international conference on cloud computing (CLOUD) (pp. 184–191).
17.
Zurück zum Zitat Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., & Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments. In 8th USENIX symposium on operating systems design and implementation (OSDI) (pp. 29–42). Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., & Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments. In 8th USENIX symposium on operating systems design and implementation (OSDI) (pp. 29–42).
18.
Zurück zum Zitat Xiong, R., Luo, J., & Dong, F. (2014). SLDP: A novel data placement strategy for large-scale heterogeneous Hadoop cluster. In Second international conference on advanced cloud and big data (CBD) (pp. 9–17). Xiong, R., Luo, J., & Dong, F. (2014). SLDP: A novel data placement strategy for large-scale heterogeneous Hadoop cluster. In Second international conference on advanced cloud and big data (CBD) (pp. 9–17).
19.
Zurück zum Zitat Guo, Z. & Fox, G. (2012). Improving MapReduce performance in heterogeneous network environments and resource utilization. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid) (pp. 714–716). Guo, Z. & Fox, G. (2012). Improving MapReduce performance in heterogeneous network environments and resource utilization. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid) (pp. 714–716).
20.
Zurück zum Zitat Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Task allocation scheme for Hadoop in campus network environment. In IEICE society conference (pp. B-12-20). Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Task allocation scheme for Hadoop in campus network environment. In IEICE society conference (pp. B-12-20).
21.
Zurück zum Zitat Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Resource allocation scheme for Hadoop in campus networks. In 21st Asia-Pacific conference on communications (APCC) (APCC 2015) (pp. 596–597). Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Resource allocation scheme for Hadoop in campus networks. In 21st Asia-Pacific conference on communications (APCC) (APCC 2015) (pp. 596–597).
22.
Zurück zum Zitat Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2016). Task allocation scheme based on computational and network resources for heterogeneous Hadoop clusters. In IEEE 17th international conference on high performance switching and routing (HPSR) (pp. 200–205). Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2016). Task allocation scheme based on computational and network resources for heterogeneous Hadoop clusters. In IEEE 17th international conference on high performance switching and routing (HPSR) (pp. 200–205).
23.
Zurück zum Zitat Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In 5th European conference on computer systems (EuroSys ’10) (pp. 265–278). Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In 5th European conference on computer systems (EuroSys ’10) (pp. 265–278).
24.
Zurück zum Zitat Tan, J., Meng, X., & Zhang, L. (2013). Coupling task progress for mapreduce resource-aware scheduling. In IEEE INFOCOM (pp. 1618–1626). Tan, J., Meng, X., & Zhang, L. (2013). Coupling task progress for mapreduce resource-aware scheduling. In IEEE INFOCOM (pp. 1618–1626).
25.
Zurück zum Zitat Seo, S., Jang, I., Woo, K., Kim, I., Kim, J. S., & Maeng, S. (2009).HPMR: Prefetching and pre-shuffling in shared mapreduce computation environment. In IEEE international conference on cluster computing and workshops (pp. 1–8). Seo, S., Jang, I., Woo, K., Kim, I., Kim, J. S., & Maeng, S. (2009).HPMR: Prefetching and pre-shuffling in shared mapreduce computation environment. In IEEE international conference on cluster computing and workshops (pp. 1–8).
26.
Zurück zum Zitat Jin, J., Luo, J., Song, A., Dong, F., & Xiong, R. (2011). Bar: An efficient data locality driven task scheduling algorithm for cloud computing. In 11th IEEE/ACM international symposium on cluster, cloud and grid computing (pp. 295–304). Jin, J., Luo, J., Song, A., Dong, F., & Xiong, R. (2011). Bar: An efficient data locality driven task scheduling algorithm for cloud computing. In 11th IEEE/ACM international symposium on cluster, cloud and grid computing (pp. 295–304).
27.
Zurück zum Zitat Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Twenty-second annual ACM symposium on parallelism in algorithms and architectures (SPAA ’10) (pp. 30–39). Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Twenty-second annual ACM symposium on parallelism in algorithms and architectures (SPAA ’10) (pp. 30–39).
28.
Zurück zum Zitat Wang, G., Ng, T. E., & Shaikh, A. (2012). Programming your network at run-time for big data applications. In First workshop on hot topics in software defined networks (HotSDN ’12) (pp. 103–108). Wang, G., Ng, T. E., & Shaikh, A. (2012). Programming your network at run-time for big data applications. In First workshop on hot topics in software defined networks (HotSDN ’12) (pp. 103–108).
33.
Zurück zum Zitat Oki, E. (2013). Linear programming and algorithms for communication networks. Boca Raton: CRC Press. Oki, E. (2013). Linear programming and algorithms for communication networks. Boca Raton: CRC Press.
38.
Zurück zum Zitat Lee, Y., Le Roux, J. L., King, D., & Oki, E. (2009). Path computation element communication protocol (PCEP) Requirements and Protocol Extensions in Support of Global Concurrent Optimization. IETF RFC 5557. https://tools.ietf.org/html/rfc5557. Lee, Y., Le Roux, J. L., King, D., & Oki, E. (2009). Path computation element communication protocol (PCEP) Requirements and Protocol Extensions in Support of Global Concurrent Optimization. IETF RFC 5557. https://​tools.​ietf.​org/​html/​rfc5557.
39.
Zurück zum Zitat Oki, E., Inoue, I., & Shiomoto, K. (2007). Path computation element (PCE)-based traffic engineering in MPLS and GMPLS networks. In IEEE sarnoff symposium (pp. 1–5). Oki, E., Inoue, I., & Shiomoto, K. (2007). Path computation element (PCE)-based traffic engineering in MPLS and GMPLS networks. In IEEE sarnoff symposium (pp. 1–5).
43.
Zurück zum Zitat Ishii, M., Han, J., & Makino, H. (2013). Design and Performance Evaluation for Hadoop Clusters on Virtualized Environment. In International Conference on Information Networking (ICOIN) (pp. 244-249). Ishii, M., Han, J., & Makino, H. (2013). Design and Performance Evaluation for Hadoop Clusters on Virtualized Environment. In International Conference on Information Networking (ICOIN) (pp. 244-249).
Metadaten
Titel
Designing a Hadoop system based on computational resources and network delay for wide area networks
verfasst von
Tomohiro Matsuno
Bijoy Chand Chatterjee
Nattapong Kitsuwan
Eiji Oki
Malathi Veeraraghavan
Satoru Okamoto
Naoaki Yamanaka
Publikationsdatum
26.04.2018
Verlag
Springer US
Erschienen in
Telecommunication Systems / Ausgabe 1/2019
Print ISSN: 1018-4864
Elektronische ISSN: 1572-9451
DOI
https://doi.org/10.1007/s11235-018-0464-y

Weitere Artikel der Ausgabe 1/2019

Telecommunication Systems 1/2019 Zur Ausgabe

Neuer Inhalt