Skip to main content

2016 | OriginalPaper | Buchkapitel

A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS

verfasst von : Adithya Bhat, Nusrat Sharmin Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Dipti Shankar, Dhabaleswar K. (DK) Panda

Erschienen in: Big Data Benchmarks, Performance Optimization, and Emerging Hardware

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hadoop Distributed File System (HDFS) has been popularly utilized by many Big Data processing frameworks as their underlying storage engine, such as Hadoop MapReduce, HBase, Hive, and Spark. This makes the performance of HDFS a primary concern in the Big Data community. Recent studies have shown that HDFS cannot completely exploit the performance benefits of RDMA-enabled high performance interconnects like InfiniBand. To solve these performance issues, RDMA-enabled HDFS designs have been proposed in the literature that show better performance with RDMA-enabled networks. But these designs are tightly integrated with the specific versions of the Apache Hadoop distribution, and cannot be used with other Hadoop distributions easily. In this paper, we propose an efficient RDMA-based plugin for HDFS, which can be easily integrated with various Hadoop distributions and versions like Apache Hadoop 2.5 and 2.6, Hortonworks HDP, and Cloudera CDH. Performance evaluations show that our plugin ensures the expected performance of up to 3.7x improvement in TestDFSIO write, associated with the hybrid RDMA-enhanced design, to all these distributions. We also demonstrate that our RDMA-based plugin can achieve up to 4.6x improvement over Mellanox R4H (RDMA for HDFS) plugin.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ananthanarayanan, G., Ghodsi, A., Wang, A., Borthakur, D., Kandula, S., Shenker, S., Stoica, I.: PACMan: Coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. NSDI 2012, San Jose, CA (2012) Ananthanarayanan, G., Ghodsi, A., Wang, A., Borthakur, D., Kandula, S., Shenker, S., Stoica, I.: PACMan: Coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. NSDI 2012, San Jose, CA (2012)
4.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI. Boston, MA (2004) Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI. Boston, MA (2004)
8.
Zurück zum Zitat Islam, N.S., Lu, X., Rahman, M.W., Rajachandrasekar, R., Panda, D.K.: In-memory I/O and replication for HDFS with memcached: Early experiences. In: 2014 IEEE International Conference on Big Data (IEEE BigData). Washington DC (2014) Islam, N.S., Lu, X., Rahman, M.W., Rajachandrasekar, R., Panda, D.K.: In-memory I/O and replication for HDFS with memcached: Early experiences. In: 2014 IEEE International Conference on Big Data (IEEE BigData). Washington DC (2014)
9.
Zurück zum Zitat Islam, N.S., Lu, X., Rahman, M.W., Shankar, D., Panda, D.K.: Triple-H: A hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. China (2015) Islam, N.S., Lu, X., Rahman, M.W., Shankar, D., Panda, D.K.: Triple-H: A hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. China (2015)
10.
Zurück zum Zitat Islam, N.S., Lu, X., Rahman, M.W., Panda, D.K.: SOR-HDFS: A SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS. In: The Proceedings of The 23rd International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC). Canada (2014) Islam, N.S., Lu, X., Rahman, M.W., Panda, D.K.: SOR-HDFS: A SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS. In: The Proceedings of The 23rd International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC). Canada (2014)
11.
Zurück zum Zitat Islam, N.S., Rahman, M.W., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C., Panda, D.K.: High performance RDMA-based design of HDFS over infiniBand. In: The Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Salt Lake City (2012) Islam, N.S., Rahman, M.W., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C., Panda, D.K.: High performance RDMA-based design of HDFS over infiniBand. In: The Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Salt Lake City (2012)
12.
Zurück zum Zitat Islam, N.S., Lu, X., Rahman, M.W., Panda, D.K.: Can parallel replication benefit hadoop distributed file system for high performance interconnects? In: The Proceedings of IEEE 21st Annual Symposium on High-Performance Interconnects (HOTI). San Jose, CA (2013) Islam, N.S., Lu, X., Rahman, M.W., Panda, D.K.: Can parallel replication benefit hadoop distributed file system for high performance interconnects? In: The Proceedings of IEEE 21st Annual Symposium on High-Performance Interconnects (HOTI). San Jose, CA (2013)
14.
Zurück zum Zitat Anwar, R.K., Butt, A.A.: hatS: A heterogeneity-aware tiered storage for hadoop. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2014) Anwar, R.K., Butt, A.A.: hatS: A heterogeneity-aware tiered storage for hadoop. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2014)
15.
Zurück zum Zitat R.K., Iqbal, S., Butt, A.: VENU: Orchestrating SSDs in hadoop storage. In: 2014 IEEE International Conference on Big Data (IEEE BigData) (2014) R.K., Iqbal, S., Butt, A.: VENU: Orchestrating SSDs in hadoop storage. In: 2014 IEEE International Conference on Big Data (IEEE BigData) (2014)
16.
Zurück zum Zitat Rahman, M.W., Islam, N.S., Lu, X., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance RDMA-based design of hadoop mapreduce over infiniBand. In: HPDIC, in conjunction with IPDPS. Boston, MA (2013) Rahman, M.W., Islam, N.S., Lu, X., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance RDMA-based design of hadoop mapreduce over infiniBand. In: HPDIC, in conjunction with IPDPS. Boston, MA (2013)
17.
Zurück zum Zitat Rahman, M.W., Lu, X., Islam, N.S., Panda, D.K.: HOMR: A hybrid approach to exploit maximum overlapping in mapreduce over high performance interconnects. In: ICS. Munich, Germany (2014) Rahman, M.W., Lu, X., Islam, N.S., Panda, D.K.: HOMR: A hybrid approach to exploit maximum overlapping in mapreduce over high performance interconnects. In: ICS. Munich, Germany (2014)
19.
Zurück zum Zitat Shafer, J., Rixner, S., Cox, A.: The hadoop distributed filesystem: Balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 122–133, March 2010 Shafer, J., Rixner, S., Cox, A.: The hadoop distributed filesystem: Balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 122–133, March 2010
20.
Zurück zum Zitat Shvachko, K.: HDFS Scalability: The Limits to Growth (2010) Shvachko, K.: HDFS Scalability: The Limits to Growth (2010)
22.
Zurück zum Zitat Wang, Y., Que, X., Yu, W., Goldenberg, D., Sehgal, D.: Hadoop acceleration through network levitated merge. In: SC (2011) Wang, Y., Que, X., Yu, W., Goldenberg, D., Sehgal, D.: Hadoop acceleration through network levitated merge. In: SC (2011)
23.
Zurück zum Zitat Welsh, M., Culler, D., Brewer, E.: SEDA: An architecture for well-conditioned, scalable internet services. In: Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP). Banff, Alberta, Canada (2001) Welsh, M., Culler, D., Brewer, E.: SEDA: An architecture for well-conditioned, scalable internet services. In: Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP). Banff, Alberta, Canada (2001)
24.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud 2010, Boston, MA (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud 2010, Boston, MA (2010)
Metadaten
Titel
A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS
verfasst von
Adithya Bhat
Nusrat Sharmin Islam
Xiaoyi Lu
Md. Wasi-ur-Rahman
Dipti Shankar
Dhabaleswar K. (DK) Panda
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-29006-5_10