Skip to main content
Top
Published in: Wireless Personal Communications 3/2017

13-01-2017

Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

Authors: J. V. Bibal Benifa, Dejey

Published in: Wireless Personal Communications | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

MapReduce is a parallel programming model for processing the data-intensive applications in a cloud environment. The scheduler greatly influences the performance of MapReduce model while utilized in heterogeneous cluster environment. The dynamic nature of cluster environment and computing workloads affect the execution time and computational resource usage in the scheduling process. Further, data locality is essential for reducing total job execution time, cross-rack communication, and to improve the throughput. In the present work, a scheduling strategy named efficient locality and replica aware scheduling (ELRAS) integrated with an autonomous replication scheme (ARS) is proposed to enhance the data locality and performs consistently in the heterogeneous environment. ARS autonomously decides the data object to be replicated by considering its popularity and removes the replica as it is idle. The proposed approach is validated in a heterogeneous cluster environment with various realistic applications that are IO bound, CPU bound and mixed workloads. ELRAS improves the throughput by a factor about 2 as compared with the existing FIFO and it also yields near optimal data locality, reduce the execution time, and effective utilization of resources. The simplicity of ELRAS algorithm proves its feasibility to adopt for a wide range of applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Wang, W., Zhu, K., & Ying, L. (2016). MapTask scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality. IEEE/ACM Transactions on Networking, 24(1), 190–203.CrossRef Wang, W., Zhu, K., & Ying, L. (2016). MapTask scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality. IEEE/ACM Transactions on Networking, 24(1), 190–203.CrossRef
2.
go back to reference Alsmirat, M. A., Jararweh, Y., Obaidat, I., & Gupta, B. B. (2016). Internet of surveillance: A cloud supported large-scale wireless surveillance system. Journal of Supercomputing. doi:10.1007/s11227-016-1857-x. Alsmirat, M. A., Jararweh, Y., Obaidat, I., & Gupta, B. B. (2016). Internet of surveillance: A cloud supported large-scale wireless surveillance system. Journal of Supercomputing. doi:10.​1007/​s11227-016-1857-x.
3.
go back to reference Gou, Z., Yamaguchi, S., & Gupta, B. B. (2016). Analysis of various security issues and challenges in cloud computing environment: A survey. In Handbook of research on modern cryptographic solutions for computer and cyber security (pp. 393–419, Chapter 17). IGI Global. doi:10.4018/978-1-5225-0105-3.ch017. Gou, Z., Yamaguchi, S., & Gupta, B. B. (2016). Analysis of various security issues and challenges in cloud computing environment: A survey. In Handbook of research on modern cryptographic solutions for computer and cyber security (pp. 393–419, Chapter 17). IGI Global. doi:10.​4018/​978-1-5225-0105-3.​ch017.
4.
go back to reference Dean, J., & Ghemawat, S. (2008). MapReduce simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. (50th anniversary issue).CrossRef Dean, J., & Ghemawat, S. (2008). MapReduce simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. (50th anniversary issue).CrossRef
5.
go back to reference Tripathi, S., Gupta, B. B., Almomani, A., Mishra, A., & Veluru, S. (2013). Hadoop based defense solution to handle distributed denial of service (DDoS) attacks. Journal of Information Security, 4, 150–164.CrossRef Tripathi, S., Gupta, B. B., Almomani, A., Mishra, A., & Veluru, S. (2013). Hadoop based defense solution to handle distributed denial of service (DDoS) attacks. Journal of Information Security, 4, 150–164.CrossRef
6.
go back to reference Tiwari, N., Sarkar, S., Bellur, U., & Indrawan, M. (2015). Classification framework of MapReduce scheduling algorithms. Journal of ACM Computing Surveys, 47(3), 49. Tiwari, N., Sarkar, S., Bellur, U., & Indrawan, M. (2015). Classification framework of MapReduce scheduling algorithms. Journal of ACM Computing Surveys, 47(3), 49.
7.
go back to reference Sun, M., Zhuang, H., Zhou, X., Lu, K., & Li, C. (2014). HPSO: Prefetching based scheduling to improve data locality for MapReduce clusters. In Algorithms and architectures for parallel processing: 14th International conference, China (Vol. 8631, pp. 82–95). Sun, M., Zhuang, H., Zhou, X., Lu, K., & Li, C. (2014). HPSO: Prefetching based scheduling to improve data locality for MapReduce clusters. In Algorithms and architectures for parallel processing: 14th International conference, China (Vol. 8631, pp. 82–95).
8.
go back to reference Zaharia, M., Borthakur, D., Sarma, J. S., Elmeleegy, K., Shenker, S., & Stoica, I. (2009). Job scheduling for multi-user MapReduce clusters. University of California, Berkeley, Technical Report No. UCB/EECS-2009-55. Zaharia, M., Borthakur, D., Sarma, J. S., Elmeleegy, K., Shenker, S., & Stoica, I. (2009). Job scheduling for multi-user MapReduce clusters. University of California, Berkeley, Technical Report No. UCB/EECS-2009-55.
9.
go back to reference Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Proceedings of the twenty-second annual ACM symposium on parallelism in algorithms and architectures, Greece (pp. 30–39). Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Proceedings of the twenty-second annual ACM symposium on parallelism in algorithms and architectures, Greece (pp. 30–39).
11.
go back to reference Lim, N., Majumdar, S., & Smith, P. A. (2015). A constraint programming based Hadoop scheduler for handling MapReduce jobs with deadlines on clouds. In Proceedings of the 6th ACM/SPEC international conference on performance engineering, Texas, USA (pp. 111–122). Lim, N., Majumdar, S., & Smith, P. A. (2015). A constraint programming based Hadoop scheduler for handling MapReduce jobs with deadlines on clouds. In Proceedings of the 6th ACM/SPEC international conference on performance engineering, Texas, USA (pp. 111–122).
14.
go back to reference Zhang, X., Feng, Y., Feng, S., Fan, J., & Ming, Z. (2011). An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments. In International conference on cloud and service computing. Zhang, X., Feng, Y., Feng, S., Fan, J., & Ming, Z. (2011). An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments. In International conference on cloud and service computing.
15.
go back to reference Zaharia, M., Borthakur, D., Sarma, J. S., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In European conference on computer systems, Paris (pp. 265–278). Zaharia, M., Borthakur, D., Sarma, J. S., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In European conference on computer systems, Paris (pp. 265–278).
16.
go back to reference Palanisamy, B., Singh, A., Liu, L., & Jain, B. (2011). Purlieus: Locality-aware resource allocation for MapReduce in a cloud. In Proceedings of international conference for high performance computing, networking, storage and analysis, New York, USA. Palanisamy, B., Singh, A., Liu, L., & Jain, B. (2011). Purlieus: Locality-aware resource allocation for MapReduce in a cloud. In Proceedings of international conference for high performance computing, networking, storage and analysis, New York, USA.
17.
go back to reference Rasooli, A., & Down, D. G. (2014). COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems. Future Generation Computer Systems, 36, 1–15.CrossRef Rasooli, A., & Down, D. G. (2014). COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems. Future Generation Computer Systems, 36, 1–15.CrossRef
18.
go back to reference Rasooli, A., & Down, D. G. (2012). A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In Proceedings of the 2012 SC companion: high performance computing, networking storage and analysis, Washington DC (pp. 1284–1291). Rasooli, A., & Down, D. G. (2012). A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In Proceedings of the 2012 SC companion: high performance computing, networking storage and analysis, Washington DC (pp. 1284–1291).
19.
go back to reference Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., & Goldberg, A. (2009). Quincy: Fair scheduling for distributed computing clusters. In Symposium on operating systems principles (pp. 261–276). Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., & Goldberg, A. (2009). Quincy: Fair scheduling for distributed computing clusters. In Symposium on operating systems principles (pp. 261–276).
20.
go back to reference Morton, K., Balazinska, M., & Grossman, D. (2010). ParaTimer: A progress indicator for MapReduce DAGs. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 507–518). ACM. Morton, K., Balazinska, M., & Grossman, D. (2010). ParaTimer: A progress indicator for MapReduce DAGs. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 507–518). ACM.
21.
go back to reference Hanif, M., & Lee, C. (2016). An efficient key partitioning scheme for heterogeneous MapReduce clusters. In 18th International conference on advanced communication technology (ICACT), IEEE, INSPEC Accession Number: 15823957. Hanif, M., & Lee, C. (2016). An efficient key partitioning scheme for heterogeneous MapReduce clusters. In 18th International conference on advanced communication technology (ICACT), IEEE, INSPEC Accession Number: 15823957.
22.
go back to reference Mao, Y., Zhong, H., & Wang, L. (2015). A fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment. In 14th International symposium on distributed computing and applications for business engineering and science. Mao, Y., Zhong, H., & Wang, L. (2015). A fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment. In 14th International symposium on distributed computing and applications for business engineering and science.
23.
go back to reference Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., & Stoica, I. (2009). Improving MapReduce performance in heterogeneous environments. In USENIX symposium on operating systems design and implementation (pp. 29–42). Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., & Stoica, I. (2009). Improving MapReduce performance in heterogeneous environments. In USENIX symposium on operating systems design and implementation (pp. 29–42).
24.
go back to reference Tian, C., Zhou, H., He, Y., and Zha, L. (2009). A dynamic MapReduce scheduler for heterogeneous workloads. In Eighth international conference on grid and cooperative computing, INSPEC Accession Number: 1090627. Tian, C., Zhou, H., He, Y., and Zha, L. (2009). A dynamic MapReduce scheduler for heterogeneous workloads. In Eighth international conference on grid and cooperative computing, INSPEC Accession Number: 1090627.
25.
go back to reference Chang, R. S., Chang, J. S., & Lin, S. Y. (2007). Job scheduling and data replication on data grids. Future Generation Computer Systems, 23, 846–860.CrossRef Chang, R. S., Chang, J. S., & Lin, S. Y. (2007). Job scheduling and data replication on data grids. Future Generation Computer Systems, 23, 846–860.CrossRef
26.
go back to reference Foster, I., & Ranganathan, K. (2002). Decoupling computation and data scheduling in distributed data-intensive applications. In Proceedings of the 11th IEEE international symposium on high performance distributed computing, HPDC-11. IEEE, CS Press, Edinburgh, UK (pp. 352–358). Foster, I., & Ranganathan, K. (2002). Decoupling computation and data scheduling in distributed data-intensive applications. In Proceedings of the 11th IEEE international symposium on high performance distributed computing, HPDC-11. IEEE, CS Press, Edinburgh, UK (pp. 352–358).
27.
go back to reference Park, S. M., Kim, J. H., Go, Y. B., & Yoon, W. S. (2003). Dynamic grid replication strategy based on internet hierarchy. In International workshop on grid and cooperative computing, Lecture note in computer science (Vol. 1001, pp. 1324–1331). Park, S. M., Kim, J. H., Go, Y. B., & Yoon, W. S. (2003). Dynamic grid replication strategy based on internet hierarchy. In International workshop on grid and cooperative computing, Lecture note in computer science (Vol. 1001, pp. 1324–1331).
28.
go back to reference Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., & Tuecke, S. (2000). The data grid: Towards an architecture for distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 23, 187–200.CrossRef Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., & Tuecke, S. (2000). The data grid: Towards an architecture for distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 23, 187–200.CrossRef
29.
go back to reference Polo, J., Castillo, C., Carrera, D., Becerra, Y., Whalley, I., Steinder, M., Torres, J., & Ayguade, E. (2011). Resource-aware adaptive scheduling for MapReduce clusters. In ACM/IFIP/USENIX international conference on distributed systems platforms and open distributed processing (pp. 187–207). Polo, J., Castillo, C., Carrera, D., Becerra, Y., Whalley, I., Steinder, M., Torres, J., & Ayguade, E. (2011). Resource-aware adaptive scheduling for MapReduce clusters. In ACM/IFIP/USENIX international conference on distributed systems platforms and open distributed processing (pp. 187–207).
30.
go back to reference Hammoud, M., & Sakr, M. F. (2011). Locality-aware reduce task scheduling for MapReduce. In IEEE third international conference on cloud computing technology and science (CloudCom) (pp. 570–576). Hammoud, M., & Sakr, M. F. (2011). Locality-aware reduce task scheduling for MapReduce. In IEEE third international conference on cloud computing technology and science (CloudCom) (pp. 570–576).
31.
go back to reference Chen, Q., Guo, M., Deng, Q., Zheng, L., Guo, S., & Shen, Y. (2011). HAT: History-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing, 64(3), 1038–1054.CrossRef Chen, Q., Guo, M., Deng, Q., Zheng, L., Guo, S., & Shen, Y. (2011). HAT: History-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing, 64(3), 1038–1054.CrossRef
32.
go back to reference Chen, Q., Zhang, D., Guo, M., Deng, Q., & Guo, S. (2010). SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In IEEE 10th international conference on computer and information technology (CIT), Bradford (pp. 2736–2743). Chen, Q., Zhang, D., Guo, M., Deng, Q., & Guo, S. (2010). SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In IEEE 10th international conference on computer and information technology (CIT), Bradford (pp. 2736–2743).
33.
go back to reference Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., & Wu, S. (2012). Maestro: Replica-aware map scheduling for MapReduce. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). doi:10.1109/CCGrid.2012.122. Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., & Wu, S. (2012). Maestro: Replica-aware map scheduling for MapReduce. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). doi:10.​1109/​CCGrid.​2012.​122.
34.
go back to reference Kumar, K. A., Konishetty, V. K., Voruganti, K., & Rao, G. V. P. CASH: Context aware scheduler for Hadoop. In Proceedings of the international conference on advances in computing, communications and informatics, Chennai, India (pp. 52–61). Kumar, K. A., Konishetty, V. K., Voruganti, K., & Rao, G. V. P. CASH: Context aware scheduler for Hadoop. In Proceedings of the international conference on advances in computing, communications and informatics, Chennai, India (pp. 52–61).
35.
go back to reference Zacheilas, N., & Kalogeraki, V. (2016). ChEsS: Cost-effective scheduling across multiple heterogeneous MapReduce clusters. In IEEE international conference on autonomic computing (ICAC) (pp. 65–74). Zacheilas, N., & Kalogeraki, V. (2016). ChEsS: Cost-effective scheduling across multiple heterogeneous MapReduce clusters. In IEEE international conference on autonomic computing (ICAC) (pp. 65–74).
36.
go back to reference Huang, S., Huang, J., Liu, Y., Yi, L., & Dai, J. (2010). The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In IEEE 26th international conference on data engineering workshops (ICDEW), Long Beach, CA (pp. 41–51). Huang, S., Huang, J., Liu, Y., Yi, L., & Dai, J. (2010). The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In IEEE 26th international conference on data engineering workshops (ICDEW), Long Beach, CA (pp. 41–51).
Metadata
Title
Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy
Authors
J. V. Bibal Benifa
Dejey
Publication date
13-01-2017
Publisher
Springer US
Published in
Wireless Personal Communications / Issue 3/2017
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-017-3953-5

Other articles of this Issue 3/2017

Wireless Personal Communications 3/2017 Go to the issue