Skip to main content
Erschienen in: The Journal of Supercomputing 5/2015

01.05.2015

An approach of fast data manipulation in HDFS with supplementary mechanisms

verfasst von: Youwei Wang, Can Ma, Weiping Wang, Dan Meng

Erschienen in: The Journal of Supercomputing | Ausgabe 5/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Hadoop framework has been widely applied in miscellaneous clusters to build large scalable and powerful systems for massive data processing based on commodity hardware. Hadoop distributed file system (HDFS), the distributed storage component of Hadoop, is responsible for managing vast amount of data effectively in large clusters. To utilize the parallel processing infrastructure of Hadoop, Map/Reduce, the traditional workflow needs to upload data from local file systems to HDFS first. Unfortunately, when dealing with massive data, the uploading procedure becomes extremely time-consuming which causes almost intolerable delay for urgent tasks, along with unnecessary space waste due to replicated data. The primary contribution of this paper is the proposition of Zput and its supplementary mechanism named Zport. After the implementation is described, we introduce several improved details which are significant for runtime efficiency and performance. Evaluation results prove that Zput can accelerate the local data uploading procedure by over 315.4 %, while Zport can boost the remote block distribution by over 190.3 %. Besides, the compatibility for upper-layer applications remains intact.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
3.
Zurück zum Zitat Chen Y, Ganapathi A, Katz RH (2010) To compress or not to compress—compute vs. io tradeoffs for mapreduce energy efficiency. In: Proceedings of the first ACM SIGCOMM workshop on green networking, green networking ’10. ACM, New York, pp 23–28, ISBN 978-1-4503-0196-1, doi:10.1145/1851290.1851296 Chen Y, Ganapathi A, Katz RH (2010) To compress or not to compress—compute vs. io tradeoffs for mapreduce energy efficiency. In: Proceedings of the first ACM SIGCOMM workshop on green networking, green networking ’10. ACM, New York, pp 23–28, ISBN 978-1-4503-0196-1, doi:10.​1145/​1851290.​1851296
5.
Zurück zum Zitat Crume A, Buck J, Maltzahn C, Brandt S (2012) Compressing intermediate keys between mappers and reducers in scihadoop. In: Proceedings of the 2012 SC companion: high performance computing, networking storage and analysis, SCC ’12. IEEE Computer Society, Washington, DC, pp 7–12, ISBN:978-0-7695-4956-9, doi:10.1109/SC.Companion.2012.12 Crume A, Buck J, Maltzahn C, Brandt S (2012) Compressing intermediate keys between mappers and reducers in scihadoop. In: Proceedings of the 2012 SC companion: high performance computing, networking storage and analysis, SCC ’12. IEEE Computer Society, Washington, DC, pp 7–12, ISBN:978-0-7695-4956-9, doi:10.​1109/​SC.​Companion.​2012.​12
6.
Zurück zum Zitat Eltabakh MY, Tian Y, Özcan F, Gemulla R, Krettek A, McPherson J (2011) Cohadoop: flexible data placement and its exploitation in hadoop. Proc VLDB Endow 4(9):575–585. ISSN:2150–8097, doi:10.14778/2002938.2002943 Eltabakh MY, Tian Y, Özcan F, Gemulla R, Krettek A, McPherson J (2011) Cohadoop: flexible data placement and its exploitation in hadoop. Proc VLDB Endow 4(9):575–585. ISSN:2150–8097, doi:10.​14778/​2002938.​2002943
7.
Zurück zum Zitat Fan X, Li S, Liao X, Wang L, Huang C, Ma J (2012) Datanode optimization in distributed storage systems. In: CLOUD COMPUTING 2012, The third international conference on cloud computing, GRIDs, and virtualization, pp 247–252, ISBN:978-1-61208-216-5 Fan X, Li S, Liao X, Wang L, Huang C, Ma J (2012) Datanode optimization in distributed storage systems. In: CLOUD COMPUTING 2012, The third international conference on cloud computing, GRIDs, and virtualization, pp 247–252, ISBN:978-1-61208-216-5
9.
11.
Zurück zum Zitat He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, Xu Z (2011) Rcfile: a fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Data engineering (ICDE), 2011 IEEE 27th international conference, pp 1199–1208 He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, Xu Z (2011) Rcfile: a fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: Data engineering (ICDE), 2011 IEEE 27th international conference, pp 1199–1208
12.
Zurück zum Zitat SSE Intel (2007) Programming reference. Intel’s software network, sofwareprojects. intel. com/avx, 2:7 SSE Intel (2007) Programming reference. Intel’s software network, sofwareprojects. intel. com/avx, 2:7
13.
14.
Zurück zum Zitat Urbani J, Maassen J, Bal H (2010) Massive semantic web data compression with mapreduce. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, HPDC ’10. ACM, New York, pp 795–802, ISBN:978-1-60558-942-8, doi:10.1145/1851476.1851591 Urbani J, Maassen J, Bal H (2010) Massive semantic web data compression with mapreduce. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, HPDC ’10. ACM, New York, pp 795–802, ISBN:978-1-60558-942-8, doi:10.​1145/​1851476.​1851591
16.
Zurück zum Zitat Wang Y, Wang W, Ma C, Meng D (2013) Zput: a speedy data uploading approach for the hadoop distributed file system. In: Cluster computing (CLUSTER), 2013 IEEE international conference, pp 1–5 Wang Y, Wang W, Ma C, Meng D (2013) Zput: a speedy data uploading approach for the hadoop distributed file system. In: Cluster computing (CLUSTER), 2013 IEEE international conference, pp 1–5
Metadaten
Titel
An approach of fast data manipulation in HDFS with supplementary mechanisms
verfasst von
Youwei Wang
Can Ma
Weiping Wang
Dan Meng
Publikationsdatum
01.05.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 5/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1287-6

Weitere Artikel der Ausgabe 5/2015

The Journal of Supercomputing 5/2015 Zur Ausgabe