Top

Cluster Computing

Published in:

01-03-2015

Scaling up MapReduce-based Big Data Processing on Multi-GPU systems

Authors: Hai Jiang, Yi Chen, Zhi Qiao, Tien-Hsiung Weng, Kuan-Ching Li

Published in: Cluster Computing | Issue 1/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

MapReduce is a popular data-parallel processing model encompassed with recent advances in computing technology and has been widely exploited for large-scale data analysis. The high demand on MapReduce has stimulated the investigation of MapReduce implementations with different architectural models and computing paradigms, such as multi-core clusters, Clouds, Cubieboards and GPUs. Particularly, current GPU-based MapReduce approaches mainly focus on single-GPU algorithms and cannot handle large data sets, due to the limited GPU memory capacity. Based on the previous multi-GPU MapReduce version MGMR, this paper proposes an upgrade version MGMR++ to eliminate GPU memory limitation and a pipelined version, PMGMR, to handle the Big Data challenge through both CPU memory and hard disks. MGMR++ is extended from MGMR with flexible C++ templates and CPU memory utilization, while PMGMR fine-tuned the performance through the latest GPU features such as streams and Hyper-Q as well as hard disk utilization. Compared to MGMR (Jiang et al., Cluster Computing 2013), the proposed schemes achieve about 2.5-fold performance improvement, increase system scalability, and allow programmers to write straightforward MapReduce code for Big Data.

previous article Investigating an ontology-based approach for Big Data analysis of inter-dependent medical and oral health conditions

next article Energy-efficient data replication in cloud computing datacenters

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Jiang, H., Chen, Y., Qiao, Z., Li, K.-C., Ro, W., Gaudiot, J.-C.: Accelerating MapReduce framework on multi-GPU systems. Cluster Computing, pp. 1–9. Springer, Berlin (2013)

Cubieboards: an Open ARM Mini PC, http://www.cubieboard.org 2014

CUDA Programming Guide 6.0, NVIDIA, 2014

Dean, Jeffrey, Ghemawa, Sanjay: MapReduce: simplied data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

Chen, Y., Qiao, Z., Jiang, H., Li, K.-C., Ro, W.W.: MGMR: multi-GPU based MapReduce. Grid and Pervasive Computing. Lecture Notes in Computer Science, vol. 7861, pp. 433–442. Springer, Berlin (2013)CrossRef

Bollier, D., Firestone, C.M.: The Promise and Peril of Big Data. Communications and Society Program. Aspen Institute, Washington, DC (2010)

Jinno, R., Seki, K., Uehara, K.: Parallel distributed trajectory pattern mining using MapReduce. In: Proceedings of IEEE 4th International Conference on Cloud Computing Technology and Science, pp. 269–273, 2012

Lee, D., Dinov, I., Dong, B., Gutman, B., Yanovsky, I., Toga, A.W.: CUDA optimization strategies for compute-and memory-bound neuroimaging algorithms. Comput. Methods Programs Biomed. 106, 175 (2012)CrossRef

Raina, R., Madhavan, A., Ng, A.D.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th International Conference on Machine Learning, Canada, 2009

10.

Fadika, z., Dede, E., Hartog, J., Govindaraju, M.: Marla: Mapreduce for heterogeneous clusters. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 49–56, 2012

11.

Stuart, J.A., Owens, J.D.: Multi-GPU MapReduce on GPU clusters. In: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 1068–1079, 2011

12.

Foster, I., Kesselman, C.: The Grid 2: blueprint for a new computing infrastructure, Morgan Kaufmann, 2003

13.

Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–194, 2001

14.

White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2012)

15.

Chen, L., Huo, X., Agrawal, G.: Accelerating MapReduce on a coupled CPU-GPU architecture. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 2012

16.

Nakada, H., Ogawa, H., Kudoh, T.: Stream processing with big data: SSS-MapReduce. In: Proceedings of 2012 IEEE 4th International Conference on Cloud Computing Technology and Science, pp. 618–621, 2012

17.

Ji, F., Ma, X.: Using shared memory to accelerate MapReduce on graphics processing units. In: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 805–816, 2011

18.

Chen, L., Agrawal, G.: Optimizing MapReduce for GPUs with effective shared memory usage. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, pp. 199–210, 2012

19.

Shainer, G., Ayoub, A., Lui, P., Liu, T., Kagan, M., Troot, C.R., Scantlen, G., Crozier, P.S.: The development of Mellanox/NVIDIA GPU Direct over InfiniBand new model for GPU to GPU communications. Computer Science-Research and Development, pp. 267–273. Springer, Berlin (2011)

20.

Fang, Wenbin, He, Bingsheng, Luo, Qiong, Govindaraju, Naga K.: Mars: Accelerating MapReduce with Graphics Processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011)CrossRef

21.

Elteir, M., Lin, H., Feng, W.C., Scogland, T.R.W: StreamMR: an optimized MapReduce framework for AMD GPUs. In: IEEE 17th International Conference on Parallel and Distributed Systems, pp. 364–371, 2011

22.

Tuning CUDA Applications for Kepler, http://docs.nvidia.com/cuda/kepler-tuning-guide/

23.

Nathan, B., Jared, H.: Thrust: a productivity-oriented library for CUDA. In: GPU Computing Gems: Jade Edition, Morgan Kaufmann, pp. 359–371, 2011

24.

Xiaobo, L., Paul, L., Jonathan, S., John, S., Sze, W.P., Hanmao, S.: On the versatility of parallel sorting by regular sampling. Parallel Comput. 19(10), 1079–1103 (1993)CrossRefMATHMathSciNet

25.

Bartosz, P.: A fast approximation algorithm for the subset-sum problem. Int. Trans. Oper. Res. 9(4), 437–459 (2002)CrossRefMATHMathSciNet

26.

FERMI Compute Architecture White Paper, Nvidia

27.

Shi, Y., Léon-Charles, T., De, M.B., Yves, M.: Optimized data fusion for kernal k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 1031–1039 (2012)CrossRef

Title: Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
Authors: Hai Jiang
Yi Chen
Zhi Qiao
Tien-Hsiung Weng
Kuan-Ching Li
Publication date: 01-03-2015
Publisher: Springer US
Published in: Cluster Computing / Issue 1/2015
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-014-0400-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2015

DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications

Mobile, ubiquitous multimedia and digital convergence

A semantic enhanced Power Budget Calculator for distributed computing using IEEE 802.3az

Designing interactive narratives for mobile augmented reality

A blog ranking algorithm using analysis of both blog influence and characteristics of blog posts

Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

Premium Partner