Skip to main content
Top
Published in: International Journal of Parallel Programming 3/2018

27-03-2017

Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing

Authors: Awais Ahmad, Anand Paul, Sadia Din, M. Mazhar Rathore, Gyu Sang Choi, Gwanggil Jeon

Published in: International Journal of Parallel Programming | Issue 3/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. Moreover, advancements in the field of Big Data application and data science poses additional challenges, where High-Performance Computing solution has become a key issue and has attracted attention in recent years. However, these systems are either memoryless or computational inefficient. Therefore, keeping in view the aforementioned needs, there is a requirement for a system that can efficiently analyze a stream of Big Data within their requirements. Hence, this paper presents a system architecture that enhances the working of traditional MapReduce by incorporating parallel processing algorithm. Moreover, complete four-tier architecture is also proposed that efficiently aggregate the data, eliminate unnecessary data, and analyze the data by the proposed parallel processing algorithm. The proposed system architecture both read and writes operations that enhance the efficiency of the Input/Output operation. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce. MapReduce is supported by a parallel algorithm that efficiently processes a huge volume of data sets. The system is implemented using MapReduce tool at the top of the Hadoop parallel nodes to generate and process graphs with near real-time. Moreover, the system is evaluated in terms of efficiency by considering the system throughput and processing time. The results show that the proposed system is more scalable and efficient.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ahmad, A., Paul, A., Rathore, M.M.: An efficient divide-and-conquer approach for big data analytics in machine-to-machine communication. Neurocomputing 174, 439–453 (2016)CrossRef Ahmad, A., Paul, A., Rathore, M.M.: An efficient divide-and-conquer approach for big data analytics in machine-to-machine communication. Neurocomputing 174, 439–453 (2016)CrossRef
3.
go back to reference Ahmad, A., Paul, A., Rathore, M., Chang, H.: An efficient multidimensional big data fusion approach in machine-to-machine communication. ACM Trans. Embed. Comput. Syst. (TECS) 15(2), 39 (2016) Ahmad, A., Paul, A., Rathore, M., Chang, H.: An efficient multidimensional big data fusion approach in machine-to-machine communication. ACM Trans. Embed. Comput. Syst. (TECS) 15(2), 39 (2016)
4.
go back to reference Rathore, M.M., Ullah, A.P., Ahmad, A., Chen, B.-W., Huang, B., Ji, W.: Real-time big data analytical architecture for remote sensing application. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(10), 4610–4621 (2015)CrossRef Rathore, M.M., Ullah, A.P., Ahmad, A., Chen, B.-W., Huang, B., Ji, W.: Real-time big data analytical architecture for remote sensing application. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(10), 4610–4621 (2015)CrossRef
5.
go back to reference Haderer, N., Romain, R., Seinturier, L.: Dynamic deployment of sensing experiments in the wild using smartphones. In: IFIP International Conference on Distributed Applications and Interoperable Systems, pp. 43–56. Springer, Berlin, Heidelberg (2013) Haderer, N., Romain, R., Seinturier, L.: Dynamic deployment of sensing experiments in the wild using smartphones. In: IFIP International Conference on Distributed Applications and Interoperable Systems, pp. 43–56. Springer, Berlin, Heidelberg (2013)
6.
go back to reference Mosser, S., Fleurey, F., Morin, B., Chauvel, F., Solberg, A., Goutier, I.: Sensapp as a reference platform to support cloud experiments: from the internet of things to the internet of services. In: 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 400–406. IEEE (2012) Mosser, S., Fleurey, F., Morin, B., Chauvel, F., Solberg, A., Goutier, I.: Sensapp as a reference platform to support cloud experiments: from the internet of things to the internet of services. In: 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 400–406. IEEE (2012)
7.
go back to reference Mosser, S., Logre, I., Ferry, N., Collet, P.: From sensors to visualization dashboards: need for language composition. In: Globalization of Modeling Languages workshop (GeMOC’13) (2013) Mosser, S., Logre, I., Ferry, N., Collet, P.: From sensors to visualization dashboards: need for language composition. In: Globalization of Modeling Languages workshop (GeMOC’13) (2013)
8.
go back to reference Awais, A., Paul, A., Rathore, M.M., Chang, H.: Smart cyber society: integration of capillary devices with high usability based on cyber–physical system. Future Gen. Comput. Syst. 56, 493–503 (2016)CrossRef Awais, A., Paul, A., Rathore, M.M., Chang, H.: Smart cyber society: integration of capillary devices with high usability based on cyber–physical system. Future Gen. Comput. Syst. 56, 493–503 (2016)CrossRef
9.
go back to reference Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. Proc. VLDB Endow. 5(12), 2032–2033 (2012)CrossRef Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. Proc. VLDB Endow. 5(12), 2032–2033 (2012)CrossRef
10.
go back to reference Chen, C., Lang, M., Chen, Y.: Multilevel active storage for big data applications in high performance computing. In: 2013 IEEE International Conference on Big Data, pp. 169–174. IEEE (2013) Chen, C., Lang, M., Chen, Y.: Multilevel active storage for big data applications in high performance computing. In: 2013 IEEE International Conference on Big Data, pp. 169–174. IEEE (2013)
11.
go back to reference Felix, E.J., Fox, K., Regimbal, K., Nieplocha, J.: Active storage processing in a parallel file system. In: Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution, p. 85 (2006) Felix, E.J., Fox, K., Regimbal, K., Nieplocha, J.: Active storage processing in a parallel file system. In: Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution, p. 85 (2006)
12.
go back to reference Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: The Seventh Symposium on the Frontiers of Massively Parallel Computation, 1999. Frontiers’ 99, pp. 182–189. IEEE (1999) Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: The Seventh Symposium on the Frontiers of Massively Parallel Computation, 1999. Frontiers’ 99, pp. 182–189. IEEE (1999)
13.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
14.
go back to reference Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRef Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRef
15.
go back to reference Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 198–207. IEEE (2009) Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 198–207. IEEE (2009)
16.
go back to reference Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13–24. IEEE (2007) Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13–24. IEEE (2007)
17.
go back to reference Rafique, M.M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Supporting MapReduce on large-scale asymmetric multi-core clusters. ACM SIGOPS Oper. Syst. Rev. 43(2), 25–34 (2009)CrossRef Rafique, M.M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Supporting MapReduce on large-scale asymmetric multi-core clusters. ACM SIGOPS Oper. Syst. Rev. 43(2), 25–34 (2009)CrossRef
18.
go back to reference Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. AcM sIGMoD Rec. 40(4), 11–20 (2012)CrossRef Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. AcM sIGMoD Rec. 40(4), 11–20 (2012)CrossRef
19.
go back to reference Shim, K.: MapReduce algorithms for big data analysis. Proc. VLDB Endow. 5(12), 2016–2017 (2012)CrossRef Shim, K.: MapReduce algorithms for big data analysis. Proc. VLDB Endow. 5(12), 2016–2017 (2012)CrossRef
20.
go back to reference Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow. 2(2), 1426–1437 (2009)CrossRef Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow. 2(2), 1426–1437 (2009)CrossRef
21.
go back to reference Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRef Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRef
22.
go back to reference Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: IEEE Fourth International Conference on eScience, 2008. eScience’08, pp. 277–284. IEEE (2008) Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: IEEE Fourth International Conference on eScience, 2008. eScience’08, pp. 277–284. IEEE (2008)
23.
go back to reference Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Exploiting encrypted and tunneled multimedia calls in high-speed big data environment. Multimed. Tools Appl. 1–26 (2017) Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Exploiting encrypted and tunneled multimedia calls in high-speed big data environment. Multimed. Tools Appl. 1–26 (2017)
24.
go back to reference Paul, A., Ahmad, A., Rathore, M.M., Jabbar, S.: Smartbuddy: defining human behaviors using big data analytics in social internet of things. IEEE Wirel. Commun. 23(5), 68–74 (2016)CrossRef Paul, A., Ahmad, A., Rathore, M.M., Jabbar, S.: Smartbuddy: defining human behaviors using big data analytics in social internet of things. IEEE Wirel. Commun. 23(5), 68–74 (2016)CrossRef
25.
go back to reference Rathore, M.M., Paul, A., Ahmad, A., Jeon, G.: IoT-based big data: from smart city towards next generation super city planning. Int. J. Semant. Web Inf. Syst. 13(1), 28–47 (2017)CrossRef Rathore, M.M., Paul, A., Ahmad, A., Jeon, G.: IoT-based big data: from smart city towards next generation super city planning. Int. J. Semant. Web Inf. Syst. 13(1), 28–47 (2017)CrossRef
26.
go back to reference Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
29.
go back to reference Gropp, W., Lusk, E., Sterling, T.: Enabling Technologies in Beowulf Cluster Computing with Linux, 2nd edn, vol. 3. The MIT Press, Cambridge, MA, London, England, p. 14 (2003) Gropp, W., Lusk, E., Sterling, T.: Enabling Technologies in Beowulf Cluster Computing with Linux, 2nd edn, vol. 3. The MIT Press, Cambridge, MA, London, England, p. 14 (2003)
30.
go back to reference Sterling, T.L., Salmon, J., Becker, D.J., Savarese, D.F.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge, MA (1999) Sterling, T.L., Salmon, J., Becker, D.J., Savarese, D.F.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge, MA (1999)
31.
go back to reference Engelmann, C., Ong, H., Scott, S.L.: Middleware in modern high performance computing system architectures. In: International Conference on Computational Science, pp. 784–791. Springer, Berlin, Heidelberg (2007) Engelmann, C., Ong, H., Scott, S.L.: Middleware in modern high performance computing system architectures. In: International Conference on Computational Science, pp. 784–791. Springer, Berlin, Heidelberg (2007)
33.
go back to reference Wasi-ur Rahman, Md., Lu, X., Islam, N.S., Rajachandrasekar, R., Panda, D.K.: MapReduce over Lustre: Can RDMA-Based Approach Benefit? In: tEuropean Conference on Parallel Processing, pp. 644–655. Springer, Berlin (2014) Wasi-ur Rahman, Md., Lu, X., Islam, N.S., Rajachandrasekar, R., Panda, D.K.: MapReduce over Lustre: Can RDMA-Based Approach Benefit? In: tEuropean Conference on Parallel Processing, pp. 644–655. Springer, Berlin (2014)
34.
go back to reference Wasi-ur-Rahman, Md., Islam, N.S., Lu, X., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance RDMA-based design of Hadoop MapReduce over InfiniBand. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pp. 1908–1917. IEEE (2013) Wasi-ur-Rahman, Md., Islam, N.S., Lu, X., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance RDMA-based design of Hadoop MapReduce over InfiniBand. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pp. 1908–1917. IEEE (2013)
35.
go back to reference Wasi-ur Rahman, Md., Lu, X., Islam, N.S., Panda, D.K.: HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM international conference on Supercomputing, pp. 33–42. ACM (2014) Wasi-ur Rahman, Md., Lu, X., Islam, N.S., Panda, D.K.: HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM international conference on Supercomputing, pp. 33–42. ACM (2014)
36.
go back to reference Lu, X., Islam, N.S., Wasi-Ur-Rahman, Md., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance design of Hadoop RPC with RDMA over InfiniBand. In: 2013 42nd International Conference on Parallel Processing, pp. 641–650. IEEE (2013). doi:10.1109/ICPP.2013.78 Lu, X., Islam, N.S., Wasi-Ur-Rahman, Md., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance design of Hadoop RPC with RDMA over InfiniBand. In: 2013 42nd International Conference on Parallel Processing, pp. 641–650. IEEE (2013). doi:10.​1109/​ICPP.​2013.​78
Metadata
Title
Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
Authors
Awais Ahmad
Anand Paul
Sadia Din
M. Mazhar Rathore
Gyu Sang Choi
Gwanggil Jeon
Publication date
27-03-2017
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 3/2018
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-017-0498-x

Other articles of this Issue 3/2018

International Journal of Parallel Programming 3/2018 Go to the issue

Premium Partner