Skip to main content
Erschienen in: Journal of Network and Systems Management 4/2016

01.10.2016

OPTIMA: On-Line Partitioning Skew Mitigation for MapReduce with Resource Adjustment

verfasst von: Zhihong Liu, Qi Zhang, Raouf Boutaba, Yaping Liu, Baosheng Wang

Erschienen in: Journal of Network and Systems Management | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Partitioning skew has been shown to be a major issue that can significantly prolong the execution time of MapReduce jobs. Most of the existing off-line heuristics for partitioning skew mitigation are inefficient; they have to wait for the completion of all the map tasks. Some solutions can tackle this problem on-line, but will impose an additional overhead by repartitioning the workload of overloaded tasks. In this paper, we present OPTIMA, an on-line partitioning skew mitigation technique for MapReduce. OPTIMA predicts the workload distribution of reduce tasks at run-time, leverages the deviation detection technique to identify the overloaded tasks and pro-actively adjusts resource allocation for these tasks to reduce their execution time. We provide the upper bound of OPTIMA in time complexity, while allowing OPTIMA to perform totally on-line. Through experiments using both real and synthetic workloads running on an 11-node Hadoop cluster, we have observed OPTIMA can effectively mitigate the partitioning skew and improved the job completion time by up to 36.73 % in our experiments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Since the memory requirement is related to the size of data, larger datasets are needed in order to clearly demonstrate the impact of memory allocation.
 
2
Using compression in Hadoop to optimize MapReduce performance is prevalent in industry and academia [6, 27, 34].
 
Literatur
1.
Zurück zum Zitat Ananthanarayanan, G., Hung, M.C.C., Ren, X., Stoica, I., Wierman, A., Yu, M.: Grass: trimming stragglers in approximation analytics. In: Proceedings of the 11th USENIX NSDI (2014) Ananthanarayanan, G., Hung, M.C.C., Ren, X., Stoica, I., Wierman, A., Yu, M.: Grass: trimming stragglers in approximation analytics. In: Proceedings of the 11th USENIX NSDI (2014)
2.
Zurück zum Zitat Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, p. 24. (2010) Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, p. 24. (2010)
3.
Zurück zum Zitat Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: KDD, pp. 164–169. (1996) Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: KDD, pp. 164–169. (1996)
4.
Zurück zum Zitat Bates, D.M., Watts, D.G.: Nonlinear Regression: Iterative Estimation and Linear Approximations. Wiley, New Jersey (1988) Bates, D.M., Watts, D.G.: Nonlinear Regression: Iterative Estimation and Linear Approximations. Wiley, New Jersey (1988)
5.
Zurück zum Zitat Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Proj. Website 11, 21 (2007) Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Proj. Website 11, 21 (2007)
6.
Zurück zum Zitat Chen, Y., Ganapathi, A., Katz, R.H.: To compress or not to compress-compute vs. io tradeoffs for mapreduce energy efficiency. In: Proceedings of the First ACM SIGCOMM Workshop on Green Networking, pp. 23–28. ACM (2010) Chen, Y., Ganapathi, A., Katz, R.H.: To compress or not to compress-compute vs. io tradeoffs for mapreduce energy efficiency. In: Proceedings of the First ACM SIGCOMM Workshop on Green Networking, pp. 23–28. ACM (2010)
7.
Zurück zum Zitat Chowdhury, M., Zaharia, M., Ma, J., Jordan, M.I., Stoica, I.: Managing data transfers in computer clusters with orchestra. In: ACM SIGCOMM Computer Communication Review, vol. 41, pp. 98–109. ACM (2011) Chowdhury, M., Zaharia, M., Ma, J., Jordan, M.I., Stoica, I.: Managing data transfers in computer clusters with orchestra. In: ACM SIGCOMM Computer Communication Review, vol. 41, pp. 98–109. ACM (2011)
8.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
9.
Zurück zum Zitat Finch, T.: Incremental calculation of weighted mean and variance. University of Cambridge, Cambridge (2009) Finch, T.: Incremental calculation of weighted mean and variance. University of Cambridge, Cambridge (2009)
10.
Zurück zum Zitat Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: NSDI, vol. 11, pp. 24–24 (2011) Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: NSDI, vol. 11, pp. 24–24 (2011)
11.
Zurück zum Zitat Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handing data skew in mapreduce. In: Proceedings of the 1st International Conference on Cloud Computing and Services Science, vol. 146, pp. 574–583 (2011) Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handing data skew in mapreduce. In: Proceedings of the 1st International Conference on Cloud Computing and Services Science, vol. 146, pp. 574–583 (2011)
12.
Zurück zum Zitat Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in mapreduce based on scalable cardinality estimates. In: Data Engineering (ICDE), 2012 IEEE 28th International Conference on, pp. 522–533. IEEE (2012) Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in mapreduce based on scalable cardinality estimates. In: Data Engineering (ICDE), 2012 IEEE 28th International Conference on, pp. 522–533. IEEE (2012)
15.
Zurück zum Zitat Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, pp. 49–58. IEEE (2012) Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, pp. 49–58. IEEE (2012)
16.
Zurück zum Zitat Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., Wu, S.: Handling partitioning skew in mapreduce using leen. Peer Peer Netw. Appl. 6(4), 409–424 (2013)CrossRef Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., Wu, S.: Handling partitioning skew in mapreduce using leen. Peer Peer Netw. Appl. 6(4), 409–424 (2013)CrossRef
17.
Zurück zum Zitat Jain, R., Chiu, D.M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system (1984) Jain, R., Chiu, D.M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system (1984)
18.
Zurück zum Zitat Jalaparti, V., Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Bridging the tenant-provider gap in cloud services. In: Proceedings of the Third ACM Symposium on Cloud Computing, p. 10. ACM (2012) Jalaparti, V., Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Bridging the tenant-provider gap in cloud services. In: Proceedings of the Third ACM Symposium on Cloud Computing, p. 10. ACM (2012)
19.
Zurück zum Zitat Kang, J.M., Bannazadeh, H., Leon-Garcia, A.: Savi testbed: Control and management of converged virtual ict resources. In: IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), 2013 pp. 664–667. IEEE (2013) Kang, J.M., Bannazadeh, H., Leon-Garcia, A.: Savi testbed: Control and management of converged virtual ict resources. In: IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), 2013 pp. 664–667. IEEE (2013)
20.
Zurück zum Zitat Kirby, G.: Zipf’s law. UK J. Nav. Sci. 10(3), 180–185 (1985) Kirby, G.: Zipf’s law. UK J. Nav. Sci. 10(3), 180–185 (1985)
21.
Zurück zum Zitat Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2012) Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2012)
22.
Zurück zum Zitat Le, Y., Liu, J., Ergun, F., Wang, D.: Online load balancing for mapreduce with skewed data input. In: INFOCOM, 2014 Proceedings IEEE, pp. 2004–2012. IEEE (2014) Le, Y., Liu, J., Ergun, F., Wang, D.: Online load balancing for mapreduce with skewed data input. In: INFOCOM, 2014 Proceedings IEEE, pp. 2004–2012. IEEE (2014)
24.
Zurück zum Zitat Lin, J., Dyer, C.: Data-intensive text processing with mapreduce. Synth. Lect. Hum. Lang. Technol. 3(1), 1–177 (2010b)CrossRef Lin, J., Dyer, C.: Data-intensive text processing with mapreduce. Synth. Lect. Hum. Lang. Technol. 3(1), 1–177 (2010b)CrossRef
25.
Zurück zum Zitat Liu, Z., Zhang, Q., Zhani, M.F., Boutaba, R., Liu, Y., Gong, Z.: Dreams: Dynamic resource allocation for mapreduce with data skew. In: IFIP/IEEE International Symposium on Integrated Network Management (IM 2015), 2015. Ottawa (2015) Liu, Z., Zhang, Q., Zhani, M.F., Boutaba, R., Liu, Y., Gong, Z.: Dreams: Dynamic resource allocation for mapreduce with data skew. In: IFIP/IEEE International Symposium on Integrated Network Management (IM 2015), 2015. Ottawa (2015)
26.
Zurück zum Zitat Papadimitriou, C.H.: Computational Complexity. Wiley, New Jersey (2003)MATH Papadimitriou, C.H.: Computational Complexity. Wiley, New Jersey (2003)MATH
28.
Zurück zum Zitat Polo, J., Carrera, D., Becerra, Y., Torres, J., Ayguadé, E., Steinder, M., Whalley, I.: Performance-driven task co-scheduling for mapreduce environments. In: Network Operations and Management Symposium (NOMS), 2010 IEEE, pp. 373–380. IEEE (2010) Polo, J., Carrera, D., Becerra, Y., Torres, J., Ayguadé, E., Steinder, M., Whalley, I.: Performance-driven task co-scheduling for mapreduce environments. In: Network Operations and Management Symposium (NOMS), 2010 IEEE, pp. 373–380. IEEE (2010)
29.
Zurück zum Zitat Ramakrishnan, S.R., Swart, G., Urmanov, A.: Balancing reducer skew in mapreduce workloads using progressive sampling. In: Proceedings of the Third ACM Symposium on Cloud Computing, p. 16. ACM (2012) Ramakrishnan, S.R., Swart, G., Urmanov, A.: Balancing reducer skew in mapreduce workloads using progressive sampling. In: Proceedings of the Third ACM Symposium on Cloud Computing, p. 16. ACM (2012)
30.
Zurück zum Zitat Sharma, B., Prabhakar, R., Lim, S., Kandemir, M.T., Das, C.R.: Mrorchestrator: A fine-grained resource orchestration framework for mapreduce clusters. In: IEEE 5th International Conference on Cloud Computing (CLOUD), 2012, pp. 1–8. IEEE (2012) Sharma, B., Prabhakar, R., Lim, S., Kandemir, M.T., Das, C.R.: Mrorchestrator: A fine-grained resource orchestration framework for mapreduce clusters. In: IEEE 5th International Conference on Cloud Computing (CLOUD), 2012, pp. 1–8. IEEE (2012)
31.
Zurück zum Zitat Tan, P.N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006) Tan, P.N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006)
32.
Zurück zum Zitat Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013) Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)
33.
Zurück zum Zitat Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011) Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011)
34.
Zurück zum Zitat White, T.: Hadoop: The definitive guide. O’Reilly Media Inc, California (2012) White, T.: Hadoop: The definitive guide. O’Reilly Media Inc, California (2012)
35.
Zurück zum Zitat Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.L., Balmin, A.: Flex: A slot allocation scheduling optimizer for mapreduce workloads. In: Middleware 2010, pp. 1–20. Springer (2010) Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.L., Balmin, A.: Flex: A slot allocation scheduling optimizer for mapreduce workloads. In: Middleware 2010, pp. 1–20. Springer (2010)
36.
Zurück zum Zitat Yadwadkar, N.J., Ananthanarayanan, G., Katz, R.: Wrangler: Predictable and faster jobs using fewer resources. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–14. ACM (2014) Yadwadkar, N.J., Ananthanarayanan, G., Katz, R.: Wrangler: Predictable and faster jobs using fewer resources. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–14. ACM (2014)
37.
Zurück zum Zitat Zacheilas, N., Kalogeraki, V.: Real-time scheduling of skewed mapreduce jobs in heterogeneous environments. In: Proceedings of 11th International Conference on Autonomic Computing, pp. 189–200. USENIX (2014) Zacheilas, N., Kalogeraki, V.: Real-time scheduling of skewed mapreduce jobs in heterogeneous environments. In: Proceedings of 11th International Conference on Autonomic Computing, pp. 189–200. USENIX (2014)
38.
Zurück zum Zitat Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI, vol. 8, p. 7 (2008) Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI, vol. 8, p. 7 (2008)
39.
Zurück zum Zitat Zhang, Z., Cherkasova, L., Loo, B.T.: Autotune: Optimizing execution concurrency and resource usage in mapreduce workflows. In: ICAC, pp. 175–181 (2013) Zhang, Z., Cherkasova, L., Loo, B.T.: Autotune: Optimizing execution concurrency and resource usage in mapreduce workflows. In: ICAC, pp. 175–181 (2013)
Metadaten
Titel
OPTIMA: On-Line Partitioning Skew Mitigation for MapReduce with Resource Adjustment
verfasst von
Zhihong Liu
Qi Zhang
Raouf Boutaba
Yaping Liu
Baosheng Wang
Publikationsdatum
01.10.2016
Verlag
Springer US
Erschienen in
Journal of Network and Systems Management / Ausgabe 4/2016
Print ISSN: 1064-7570
Elektronische ISSN: 1573-7705
DOI
https://doi.org/10.1007/s10922-015-9362-8

Weitere Artikel der Ausgabe 4/2016

Journal of Network and Systems Management 4/2016 Zur Ausgabe

Premium Partner