Skip to main content
Erschienen in: Cluster Computing 2/2018

24.07.2017

AEGEUS++: an energy-aware online partition skew mitigation algorithm for mapreduce in cloud

verfasst von: Vimalkumar Kumaresan, R. Baskaran, P. Dhavachelvan

Erschienen in: Cluster Computing | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper investigates the partition skew problem at reduce phase in the mapreduce jobs. Our study summarize the skew problem in both offline and online manner. Offline is a heuristics based approach waits for the completion of map tasks and it involves computation overhead to estimate the partition size. In online approach, the overloaded tasks are distributed across other nodes that needs extra split and merge operations. These extra operations and ineffective utilization of resources in turn hamper the performance of the entire system. In this paper, we propose Aegeus++, to address the skew mitigation and adaptive data sampling problems for mapreduce jobs which enables to build an online prediction model with improved accuracy in minimal waiting time. In addition, we propose near linear skew detection and fine-grained Resource Allocation algorithms for identifying the skewed partition and allocating appropriate resources to reducers based on the partition size. Finally, our energy-aware opportunistic frequency tuning algorithm improves the performance of the reducer container on-fly, that can process the skewed data faster with minimal energy consumption. We evaluated Aegeus++ in the cloud setup by using benchmark datasets, compared its performance with native Hadoop and its other approaches. Based on our observation, Aegeus++ outperforms native Hadoop by 44% by maximizing its overall performance of the application and decreases the energy consumption by 37.67% when compared with existing approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.: Puma: Purdue mapreduce benchmarks suite (2012) Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.: Puma: Purdue mapreduce benchmarks suite (2012)
2.
Zurück zum Zitat Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, p. 24 (2010) Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, p. 24 (2010)
3.
Zurück zum Zitat Bulmer, M.G.: Principles of Statistics. Courier Corporation, Mineola (1979)MATH Bulmer, M.G.: Principles of Statistics. Courier Corporation, Mineola (1979)MATH
4.
Zurück zum Zitat Chen, Q., Yao, J., Xiao, Z.: Libra: lightweight data skew mitigation in mapreduce. IEEE Trans Parallel Distrib. Syst. 26(9), 2520–2533 (2015)CrossRef Chen, Q., Yao, J., Xiao, Z.: Libra: lightweight data skew mitigation in mapreduce. IEEE Trans Parallel Distrib. Syst. 26(9), 2520–2533 (2015)CrossRef
6.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
7.
Zurück zum Zitat Dhawalia, P., Kailasam, S., Janakiram, D.: Chisel: A resource savvy approach for handling skew in mapreduce applications. In: 2013 IEEE Sixth International Conference on Cloud Computing, pp. 652–660. IEEE (2013) Dhawalia, P., Kailasam, S., Janakiram, D.: Chisel: A resource savvy approach for handling skew in mapreduce applications. In: 2013 IEEE Sixth International Conference on Cloud Computing, pp. 652–660. IEEE (2013)
8.
Zurück zum Zitat Dhawalia, P., Kailasam, S., Janakiram, D.: Chisel++: handling partitioning skew in mapreduce framework using efficient range partitioning technique. In: Proceedings of the Sixth International Workshop on Data Intensive Distributed Computing, pp. 21–28. ACM (2014) Dhawalia, P., Kailasam, S., Janakiram, D.: Chisel++: handling partitioning skew in mapreduce framework using efficient range partitioning technique. In: Proceedings of the Sixth International Workshop on Data Intensive Distributed Computing, pp. 21–28. ACM (2014)
9.
Zurück zum Zitat Elmeleegy, K., Olston, C., Reed, B.: Spongefiles: Mitigating data skew in mapreduce using distributed memory. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 551–562. ACM (2014) Elmeleegy, K., Olston, C., Reed, B.: Spongefiles: Mitigating data skew in mapreduce using distributed memory. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 551–562. ACM (2014)
10.
Zurück zum Zitat Greenberg, A., Hamilton, J., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in data center networks. ACM SIGCOMM Comput. Commun. Rev. 39(1), 68–73 (2008)CrossRef Greenberg, A., Hamilton, J., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in data center networks. ACM SIGCOMM Comput. Commun. Rev. 39(1), 68–73 (2008)CrossRef
11.
Zurück zum Zitat Hackenberg, D., Schöne, R., Ilsche, T., Molka, D., Schuchart, J., Geyer, R.: An energy efficiency feature survey of the intel haswell processor. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp. 896–904. IEEE (2015) Hackenberg, D., Schöne, R., Ilsche, T., Molka, D., Schuchart, J., Geyer, R.: An energy efficiency feature survey of the intel haswell processor. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp. 896–904. IEEE (2015)
13.
Zurück zum Zitat Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for mapreduce. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, pp. 570–576. IEEE (2011) Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for mapreduce. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, pp. 570–576. IEEE (2011)
14.
Zurück zum Zitat Hartog, J., Dede, E., Govindaraju, M.: Mapreduce framework energy adaptation via temperature awareness. Cluster Comput. 17(1), 111–127 (2014)CrossRef Hartog, J., Dede, E., Govindaraju, M.: Mapreduce framework energy adaptation via temperature awareness. Cluster Comput. 17(1), 111–127 (2014)CrossRef
15.
Zurück zum Zitat Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In: Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, pp. 17–24. IEEE (2010) Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In: Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, pp. 17–24. IEEE (2010)
16.
Zurück zum Zitat Ibrahim, S., Moise, D., Chihoub, H.E., Carpen-Amarie, A., Bougé, L., Antoniu, G.: Towards efficient power management in mapreduce: investigation of cpu-frequencies scaling on power efficiency in hadoop. In: International Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, pp. 147–164. Springer, Berlin (2014) Ibrahim, S., Moise, D., Chihoub, H.E., Carpen-Amarie, A., Bougé, L., Antoniu, G.: Towards efficient power management in mapreduce: investigation of cpu-frequencies scaling on power efficiency in hadoop. In: International Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, pp. 147–164. Springer, Berlin (2014)
17.
Zurück zum Zitat Intel: Intel xeon e5-e3 v3 spec update. Accessed 4 Jan 2017 (2017) Intel: Intel xeon e5-e3 v3 spec update. Accessed 4 Jan 2017 (2017)
18.
Zurück zum Zitat Jain, R., Chiu, D.M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system, vol. 38. Eastern Research Laboratory, Digital Equipment Corporation, Hudson (1984) Jain, R., Chiu, D.M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system, vol. 38. Eastern Research Laboratory, Digital Equipment Corporation, Hudson (1984)
19.
Zurück zum Zitat Kaushik, R.T., Bhandarkar, M.: Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the USENIX annual technical conference, p. 109 (2010) Kaushik, R.T., Bhandarkar, M.: Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the USENIX annual technical conference, p. 109 (2010)
20.
Zurück zum Zitat Kim, W., Shin, D., Yun, H.S., Kim, J., Min, S.L.: Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In: Real-Time and Embedded Technology and Applications Symposium, 2002. Proceedings. Eighth IEEE, pp. 219–228. IEEE (2002) Kim, W., Shin, D., Yun, H.S., Kim, J., Min, S.L.: Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In: Real-Time and Embedded Technology and Applications Symposium, 2002. Proceedings. Eighth IEEE, pp. 219–228. IEEE (2002)
21.
Zurück zum Zitat Kumaresan, V., Baskaran, R.: Aegeus: An online partition skew mitigation algorithm for mapreduce. In: Proceedings of the International Conference on Informatics and Analytics, p. 100. ACM (2016) Kumaresan, V., Baskaran, R.: Aegeus: An online partition skew mitigation algorithm for mapreduce. In: Proceedings of the International Conference on Informatics and Analytics, p. 100. ACM (2016)
22.
Zurück zum Zitat Komarasamy, D., Muthuswamy, V.: Deadline constrained adaptive multilevel scheduling system in cloud environment. KSII Trans. Internet Inf. Syst. (TIIS) 9(4), 1302–1320 (2015) Komarasamy, D., Muthuswamy, V.: Deadline constrained adaptive multilevel scheduling system in cloud environment. KSII Trans. Internet Inf. Syst. (TIIS) 9(4), 1302–1320 (2015)
23.
Zurück zum Zitat Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2012) Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2012)
24.
Zurück zum Zitat Le, Y., Liu, J., Ergün, F., Wang, D.: Online load balancing for mapreduce with skewed data input. In: IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 2004–2012. IEEE (2014) Le, Y., Liu, J., Ergün, F., Wang, D.: Online load balancing for mapreduce with skewed data input. In: IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 2004–2012. IEEE (2014)
25.
Zurück zum Zitat Leverich, J., Kozyrakis, C.: On the energy (in) efficiency of hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)CrossRef Leverich, J., Kozyrakis, C.: On the energy (in) efficiency of hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)CrossRef
26.
Zurück zum Zitat Li, P., Ju, L., Jia, Z., Sun, Z.: Sla-aware energy-efficient scheduling scheme for hadoop yarn. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on, pp. 623–628. IEEE (2015) Li, P., Ju, L., Jia, Z., Sun, Z.: Sla-aware energy-efficient scheduling scheme for hadoop yarn. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on, pp. 623–628. IEEE (2015)
27.
Zurück zum Zitat Liu, Z., Zhang, Q., Boutaba, R., Liu, Y., Wang, B.: Optima: on-line partitioning skew mitigation for mapreduce with resource adjustment. J. Netw. Syst. Manag. 25, 859–883 (2016)CrossRef Liu, Z., Zhang, Q., Boutaba, R., Liu, Y., Wang, B.: Optima: on-line partitioning skew mitigation for mapreduce with resource adjustment. J. Netw. Syst. Manag. 25, 859–883 (2016)CrossRef
28.
Zurück zum Zitat Liu, Z., Zhang, Q., Zhani, M.F., Boutaba, R., Liu, Y., Gong, Z.: Dreams: dynamic resource allocation for mapreduce with data skew. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 18–26. IEEE (2015) Liu, Z., Zhang, Q., Zhani, M.F., Boutaba, R., Liu, Y., Gong, Z.: Dreams: dynamic resource allocation for mapreduce with data skew. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 18–26. IEEE (2015)
29.
Zurück zum Zitat Payberah, A.H., Kavalionak, H., Kumaresan, V., Montresor, A., Haridi, S.: Clive: cloud-assisted p2p live streaming. In: Peer-to-Peer Computing (P2P), 2012 IEEE 12th International Conference on, pp. 79–90. IEEE (2012) Payberah, A.H., Kavalionak, H., Kumaresan, V., Montresor, A., Haridi, S.: Clive: cloud-assisted p2p live streaming. In: Peer-to-Peer Computing (P2P), 2012 IEEE 12th International Conference on, pp. 79–90. IEEE (2012)
32.
Zurück zum Zitat Van Heddeghem, W., Lambert, S., Lannoo, B., Colle, D., Pickavet, M., Demeester, P.: Trends in worldwide ict electricity consumption from 2007 to 2012. Comput. Commun. 50, 64–76 (2014)CrossRef Van Heddeghem, W., Lambert, S., Lannoo, B., Colle, D., Pickavet, M., Demeester, P.: Trends in worldwide ict electricity consumption from 2007 to 2012. Comput. Commun. 50, 64–76 (2014)CrossRef
33.
Zurück zum Zitat Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM (2013) Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM (2013)
35.
Zurück zum Zitat Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011) Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011)
36.
Zurück zum Zitat Wang, G., Wang, S., Luo, B., Shi, W., Zhu, Y., Yang, W., Hu, D., Huang, L., Jin, X., Xu, W.: Increasing large-scale data center capacity by statistical power control. In: Proceedings of the Eleventh European Conference on Computer Systems, p. 8. ACM (2016) Wang, G., Wang, S., Luo, B., Shi, W., Zhu, Y., Yang, W., Hu, D., Huang, L., Jin, X., Xu, W.: Increasing large-scale data center capacity by statistical power control. In: Proceedings of the Eleventh European Conference on Computer Systems, p. 8. ACM (2016)
37.
Zurück zum Zitat Wirtz, T., Ge, R.: Improving mapreduce energy efficiency for computation intensive workloads. In: Green Computing Conference and Workshops (IGCC), 2011 International, pp. 1–8. IEEE (2011) Wirtz, T., Ge, R.: Improving mapreduce energy efficiency for computation intensive workloads. In: Green Computing Conference and Workshops (IGCC), 2011 International, pp. 1–8. IEEE (2011)
38.
Zurück zum Zitat Zaheilas, N., Kalogeraki, V.: Real-time scheduling of skewed mapreduce jobs in heterogeneous environments. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 189–200 (2014) Zaheilas, N., Kalogeraki, V.: Real-time scheduling of skewed mapreduce jobs in heterogeneous environments. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 189–200 (2014)
39.
Zurück zum Zitat Zhang, Z., Feng, X.: New methods for deviation-based outlier detection in large database. In: Fuzzy Systems and Knowledge Discovery, 2009. FSKD’09. Sixth International Conference on, vol. 1, pp. 495–499. IEEE (2009) Zhang, Z., Feng, X.: New methods for deviation-based outlier detection in large database. In: Fuzzy Systems and Knowledge Discovery, 2009. FSKD’09. Sixth International Conference on, vol. 1, pp. 495–499. IEEE (2009)
Metadaten
Titel
AEGEUS++: an energy-aware online partition skew mitigation algorithm for mapreduce in cloud
verfasst von
Vimalkumar Kumaresan
R. Baskaran
P. Dhavachelvan
Publikationsdatum
24.07.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2018
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1044-8

Weitere Artikel der Ausgabe 2/2018

Cluster Computing 2/2018 Zur Ausgabe

Premium Partner