Skip to main content
Erschienen in: The Journal of Supercomputing 2/2019

27.03.2017

Prefetching-based metadata management in Advanced Multitenant Hadoop

verfasst von: Minh Chau Nguyen, Heesun Won, Siwoon Son, Myeong-Seon Gil, Yang-Sae Moon

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Metadata management is an essential part in Apache Hadoop. Performing optimization of metadata accesses enhances big data storing, processing and analyzing, especially in multitenant environments. Nevertheless, as environmental complexity increases, metadata management is becoming more challenging and costly because of the heavy performance issues. In this paper, we propose a novel approach to improve the performance of metadata management for Hadoop in the multitenant environment based on the prefetching mechanism. We create metadata access graphs based on historical access values, define access patterns and then perform prefetching potential items for the near-future requests to minimize the latency. We present a formal algorithm to apply the prefetching mechanism into the Hadoop system and perform the actual implementation on a recent Hadoop system. Experimental results show that the proposed approach can enable the high performance for metadata management as well as maintain advanced multitenancy features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Viktor MS, Kenneth C (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston Viktor MS, Kenneth C (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston
2.
Zurück zum Zitat Hagos DH (2016) Software-defined networking for scalable cloud-based services to improve system performance of hadoop-based big data applications. Int J Grid High Perform Comput 8(2):1–22CrossRef Hagos DH (2016) Software-defined networking for scalable cloud-based services to improve system performance of hadoop-based big data applications. Int J Grid High Perform Comput 8(2):1–22CrossRef
3.
Zurück zum Zitat Raj E, Nivash J, Nirmala M, Babu L (2014) A scalable cloud computing deployment framework for efficient MapReduce operations using apache YARN. In: International Conference on Information Communication and Embedded Systems, Madras, India, pp 1–6 Raj E, Nivash J, Nirmala M, Babu L (2014) A scalable cloud computing deployment framework for efficient MapReduce operations using apache YARN. In: International Conference on Information Communication and Embedded Systems, Madras, India, pp 1–6
4.
Zurück zum Zitat Wankhede P, Paul N (2016) Secure and multi-tenant Hadoop cluster–an experience. In: IEEE 2nd International Conference on Green High Performance Computing, Nagercoil, India, pp 1–7 Wankhede P, Paul N (2016) Secure and multi-tenant Hadoop cluster–an experience. In: IEEE 2nd International Conference on Green High Performance Computing, Nagercoil, India, pp 1–7
5.
Zurück zum Zitat Bobrowski S (2008) The Force.com multitenant architecture, Force.com architects white paper series, San Francisco Bobrowski S (2008) The Force.com multitenant architecture, Force.com architects white paper series, San Francisco
6.
Zurück zum Zitat White T (2015) Hadoop: the definitive guide, 4th edn. O’Reilly Media, Sebastopol White T (2015) Hadoop: the definitive guide, 4th edn. O’Reilly Media, Sebastopol
7.
Zurück zum Zitat Park K, Nguyen MC, Won HS (2015) Web-based collaborative big data analytics on big data as a service platform. In: International Conference on Advanced Communication Technology, pp 564–567 Park K, Nguyen MC, Won HS (2015) Web-based collaborative big data analytics on big data as a service platform. In: International Conference on Advanced Communication Technology, pp 564–567
8.
Zurück zum Zitat Won HS, Nguyen MC, Gil MS, Moon YS (2015) Advanced resource management with access control for multitenant Hadoop. J Commun Netw 17(6):592–601CrossRef Won HS, Nguyen MC, Gil MS, Moon YS (2015) Advanced resource management with access control for multitenant Hadoop. J Commun Netw 17(6):592–601CrossRef
9.
Zurück zum Zitat Won HS, Nguyen MC, Gil MS, Moon YS, Whang KY (2017) Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS. J Supercomput (accepted, forthcoming) Won HS, Nguyen MC, Gil MS, Moon YS, Whang KY (2017) Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS. J Supercomput (accepted, forthcoming)
10.
Zurück zum Zitat Won HS (2016) Multitenant Hadoop with advanced resource management. Ph.d. dissertation, Department of Computer Science, KAIST University, Daejeon, Korea Won HS (2016) Multitenant Hadoop with advanced resource management. Ph.d. dissertation, Department of Computer Science, KAIST University, Daejeon, Korea
11.
Zurück zum Zitat Han WS, Whang KY, Moon YS (2005) A formal framework for prefetching based on the type-level access pattern in object-relational DBMSs. IEEE Trans Knowl Data Eng 17(10):1436–1448CrossRef Han WS, Whang KY, Moon YS (2005) A formal framework for prefetching based on the type-level access pattern in object-relational DBMSs. IEEE Trans Knowl Data Eng 17(10):1436–1448CrossRef
12.
Zurück zum Zitat Zhang B, Ross B, Kosar T (2015) DLS: a cloud-hosted data caching and prefetching service for distributed metadata access. Int J Big Data Intell 2(3):183–200CrossRef Zhang B, Ross B, Kosar T (2015) DLS: a cloud-hosted data caching and prefetching service for distributed metadata access. Int J Big Data Intell 2(3):183–200CrossRef
13.
Zurück zum Zitat Banu JS, Babu MR (2015) Exploring vectorization and prefetching techniques on scientific kernels and inferring the cache performance metrics. Int J Grid High Perform Comput 7(2):18–36CrossRef Banu JS, Babu MR (2015) Exploring vectorization and prefetching techniques on scientific kernels and inferring the cache performance metrics. Int J Grid High Perform Comput 7(2):18–36CrossRef
15.
Zurück zum Zitat Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies, pp 1–10 Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies, pp 1–10
16.
Zurück zum Zitat Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache Hadoop YARN: yet another resource negotiator. In: Symposium on cloud computing (SOCC), Santa Clara, CA, USA, pp 1–16 Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache Hadoop YARN: yet another resource negotiator. In: Symposium on cloud computing (SOCC), Santa Clara, CA, USA, pp 1–16
17.
Zurück zum Zitat Neuman BC, Tso T (1994) Kerberos: an authentication service for computer net work. IEEE Commun Mag 32(19):33–38CrossRef Neuman BC, Tso T (1994) Kerberos: an authentication service for computer net work. IEEE Commun Mag 32(19):33–38CrossRef
22.
Zurück zum Zitat Mackey G, Sehrish S, Wang J (2009) Improving metadata management for small files in HDFS. In: IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, USA, pp 1–4 Mackey G, Sehrish S, Wang J (2009) Improving metadata management for small files in HDFS. In: IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, USA, pp 1–4
23.
Zurück zum Zitat Liu X, Han J, Zhong Y, Han C, He X (2009) Implementing WebGIS on Hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, USA, pp 1–8 Liu X, Han J, Zhong Y, Han C, He X (2009) Implementing WebGIS on Hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, USA, pp 1–8
24.
Zurück zum Zitat He H, Du Z, Zhang W, Chen A (2015) Optimization strategy of Hadoop small file storage for big data in healthcare. J Supercomput 71(6):1–12 He H, Du Z, Zhang W, Chen A (2015) Optimization strategy of Hadoop small file storage for big data in healthcare. J Supercomput 71(6):1–12
25.
Zurück zum Zitat Wang F, Qiu J, Yang J, Dong B, Li X, Li Y (2009) Hadoop high availability through metadata replication. In: International workshop on cloud data management, HongKong, China, pp 37–44 Wang F, Qiu J, Yang J, Dong B, Li X, Li Y (2009) Hadoop high availability through metadata replication. In: International workshop on cloud data management, HongKong, China, pp 37–44
26.
Zurück zum Zitat Hua X, Wu H, Li Z, Ren S (2014) Enhancing throughput of the Hadoop distributed file system for interaction-intensive tasks. J Parallel Distrib Comput 74(8):2770–2779CrossRef Hua X, Wu H, Li Z, Ren S (2014) Enhancing throughput of the Hadoop distributed file system for interaction-intensive tasks. J Parallel Distrib Comput 74(8):2770–2779CrossRef
27.
Zurück zum Zitat Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen J, Chen D (2013) G-Hadoop: MapReduce across distributed data centers for data-intensive computing. J Future Gener Comput Syst 29(3):739–750CrossRef Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen J, Chen D (2013) G-Hadoop: MapReduce across distributed data centers for data-intensive computing. J Future Gener Comput Syst 29(3):739–750CrossRef
28.
Zurück zum Zitat Won HS, Nguyen MC (2015) Multitenant Hadoop across geographically distributed data centers, presented at the Strata + Hadoop world, Singapore Won HS, Nguyen MC (2015) Multitenant Hadoop across geographically distributed data centers, presented at the Strata + Hadoop world, Singapore
29.
Zurück zum Zitat Dong B, Zhong X, Zheng Q, Jian L, Liu J, Qiu J, Li Y (2010) Correlation based file prefetching approach for Hadoop. In: IEEE 2nd International Conference on Cloud Computing Technology and Science, Athens, Greece, pp 41–48 Dong B, Zhong X, Zheng Q, Jian L, Liu J, Qiu J, Li Y (2010) Correlation based file prefetching approach for Hadoop. In: IEEE 2nd International Conference on Cloud Computing Technology and Science, Athens, Greece, pp 41–48
30.
Zurück zum Zitat Sun Y, Liu J, Ye D, Zhong H (2013) A distributed cache framework for metadata service of distributed file systems. In: International Conference on Parallel and Distributed Systems, Seoul, Korea, pp 51–58 Sun Y, Liu J, Ye D, Zhong H (2013) A distributed cache framework for metadata service of distributed file systems. In: International Conference on Parallel and Distributed Systems, Seoul, Korea, pp 51–58
31.
Zurück zum Zitat Wu S, Zou G, Zhu H, Shuai X, Chen L, Zhang B (2013) The dynamically efficient mechanism of HDFS data prefetching. In: IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, Beijing, China, pp 2188–2193 Wu S, Zou G, Zhu H, Shuai X, Chen L, Zhang B (2013) The dynamically efficient mechanism of HDFS data prefetching. In: IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, Beijing, China, pp 2188–2193
Metadaten
Titel
Prefetching-based metadata management in Advanced Multitenant Hadoop
verfasst von
Minh Chau Nguyen
Heesun Won
Siwoon Son
Myeong-Seon Gil
Yang-Sae Moon
Publikationsdatum
27.03.2017
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2019
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2019-5

Weitere Artikel der Ausgabe 2/2019

The Journal of Supercomputing 2/2019 Zur Ausgabe