Skip to main content
Top

2019 | OriginalPaper | Chapter

Heterogeneity-Aware Data Placement in Hybrid Clouds

Authors : Jack D. Marquez, Juan D. Gonzalez, Oscar H. Mondragon

Published in: Cloud Computing – CLOUD 2019

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In next-generation cloud computing clusters, performance of data-intensive applications will be limited, among other factors, by disks data transfer rates. In order to mitigate performance impacts, cloud systems offering hierarchical storage architectures are becoming commonplace. The Hadoop File System (HDFS) offers a collection of storage policies that exploit different storage types such as RAM_DISK, SSD, HDD, and ARCHIVE. However, developing algorithms to leverage heterogeneous storage through an efficient data placement has been challenging. This work presents an intelligent algorithm based on genetic programming which allow to find the optimal mapping of input datasets to storage types on a Hadoop file system.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zhou, K., Fu, C., Yang, S.: Big data driven smart energy management: from big data to big insights. Renew. Sustain. Energy Rev. 56, 215–225 (2016)CrossRef Zhou, K., Fu, C., Yang, S.: Big data driven smart energy management: from big data to big insights. Renew. Sustain. Energy Rev. 56, 215–225 (2016)CrossRef
2.
go back to reference Li, H., Li, H., Wen, Z., Mo, J., Wu, J.: Distributed heterogeneous storage based on data value. In: 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 264–271 (2017) Li, H., Li, H., Wen, Z., Mo, J., Wu, J.: Distributed heterogeneous storage based on data value. In: 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 264–271 (2017)
3.
go back to reference Bezerra, A., Hernandez, P., Espinosa, A., Moure, J.C.: Job scheduling in Hadoop with shared input policy and RAMDISK, pp. 355–363 (2014) Bezerra, A., Hernandez, P., Espinosa, A., Moure, J.C.: Job scheduling in Hadoop with shared input policy and RAMDISK, pp. 355–363 (2014)
4.
go back to reference Subramanyam, R.: HDFS heterogeneous storage resource management based on data temperature, pp. 232–235 (2015) Subramanyam, R.: HDFS heterogeneous storage resource management based on data temperature, pp. 232–235 (2015)
6.
go back to reference Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)
7.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008)CrossRef
8.
go back to reference Xiong, R., Luo, J., Dong, F.: Optimizing data placement in heterogeneous hadoop clusters. Clust. Comput. 18(4), 1465–1480 (2015)CrossRef Xiong, R., Luo, J., Dong, F.: Optimizing data placement in heterogeneous hadoop clusters. Clust. Comput. 18(4), 1465–1480 (2015)CrossRef
10.
go back to reference Yoon, M.S., Kamal, A.E.: Optimal dataset allocation in distributed heterogeneous clouds. In: Globecom Workshops (GC Wkshps), 2014, pp. 75–80. IEEE (2014) Yoon, M.S., Kamal, A.E.: Optimal dataset allocation in distributed heterogeneous clouds. In: Globecom Workshops (GC Wkshps), 2014, pp. 75–80. IEEE (2014)
11.
go back to reference Klein, D., Hannan, E.: An algorithm for the multiple objective integer linear programming problem. Eur. J. Oper. Res. 9(4), 378–385 (1982)MathSciNetCrossRef Klein, D., Hannan, E.: An algorithm for the multiple objective integer linear programming problem. Eur. J. Oper. Res. 9(4), 378–385 (1982)MathSciNetCrossRef
12.
go back to reference Apers, P.M.: Data allocation in distributed database systems. ACM Trans. Database Syst. (TODS) 13(3), 263–304 (1988)CrossRef Apers, P.M.: Data allocation in distributed database systems. ACM Trans. Database Syst. (TODS) 13(3), 263–304 (1988)CrossRef
13.
go back to reference Guzek, M., Bouvry, P., Talbi, E.G.: A survey of evolutionary computation for resource management of processing in cloud computing. IEEE Comput. Intell. Mag. 10(2), 53–67 (2015)CrossRef Guzek, M., Bouvry, P., Talbi, E.G.: A survey of evolutionary computation for resource management of processing in cloud computing. IEEE Comput. Intell. Mag. 10(2), 53–67 (2015)CrossRef
15.
go back to reference Lasdon, L., Waren, A.: Generalized reduced gradient software for linearly and nonlinearly constrained problems. Graduate School of Business, University of Texas at Austin Austin, TX (1977) Lasdon, L., Waren, A.: Generalized reduced gradient software for linearly and nonlinearly constrained problems. Graduate School of Business, University of Texas at Austin Austin, TX (1977)
17.
go back to reference Gen, M., Cheng, R.: Genetic Algorithms and Engineering Optimization, vol. 7. Wiley, Hoboken (2000) Gen, M., Cheng, R.: Genetic Algorithms and Engineering Optimization, vol. 7. Wiley, Hoboken (2000)
18.
go back to reference Srinivas, M., Patnaik, L.M.: Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans. Syst. Man Cybern. 24(4), 656–667 (1994)CrossRef Srinivas, M., Patnaik, L.M.: Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans. Syst. Man Cybern. 24(4), 656–667 (1994)CrossRef
19.
go back to reference Chiroma, H., Abdulkareem, S., Abubakar, A., Zeki, A., Gital, A.Y., Usman, M.J.: Correlation study of genetic algorithm operators: crossover and mutation probabilities. In: Proceedings of the International Symposium on Mathematical Sciences and Computing Research, pp. 6–7 (2013) Chiroma, H., Abdulkareem, S., Abubakar, A., Zeki, A., Gital, A.Y., Usman, M.J.: Correlation study of genetic algorithm operators: crossover and mutation probabilities. In: Proceedings of the International Symposium on Mathematical Sciences and Computing Research, pp. 6–7 (2013)
20.
21.
go back to reference Gen, M., Cheng, R.: A survey of penalty techniques in genetic algorithms. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 804–809. IEEE (1996) Gen, M., Cheng, R.: A survey of penalty techniques in genetic algorithms. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 804–809. IEEE (1996)
22.
go back to reference Michalewicz, Z., Janikow, C.Z.: Handling constraints in genetic algorithms. In: ICGA, pp. 151–157 (1991) Michalewicz, Z., Janikow, C.Z.: Handling constraints in genetic algorithms. In: ICGA, pp. 151–157 (1991)
23.
go back to reference Kolen, A.: A genetic algorithm for the partial binary constraint satisfaction problem: an application to a frequency assignment problem. Stat. Neerl. 61(1), 4–15 (2007)MathSciNetCrossRef Kolen, A.: A genetic algorithm for the partial binary constraint satisfaction problem: an application to a frequency assignment problem. Stat. Neerl. 61(1), 4–15 (2007)MathSciNetCrossRef
24.
go back to reference Li, H., Li, H., Wen, Z., Mo, J., Wu, J.: Distributed heterogeneous storage based on data value. In: 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 264–271. IEEE (2017) Li, H., Li, H., Wen, Z., Mo, J., Wu, J.: Distributed heterogeneous storage based on data value. In: 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 264–271. IEEE (2017)
25.
go back to reference Krish, K., Anwar, A., Butt, A.R.: hatS: a heterogeneity-aware tiered storage for Hadoop. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 502–511. IEEE (2014) Krish, K., Anwar, A., Butt, A.R.: hatS: a heterogeneity-aware tiered storage for Hadoop. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 502–511. IEEE (2014)
26.
go back to reference Krish, K., Iqbal, M.S., Butt, A.R.: VENU: orchestrating SSDs in Hadoop storage. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 207–212 IEEE (2014) Krish, K., Iqbal, M.S., Butt, A.R.: VENU: orchestrating SSDs in Hadoop storage. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 207–212 IEEE (2014)
27.
go back to reference Pan, F., Xiong, J., Shen, Y., Wang, T., Jiang, D.: H-scheduler: storage-aware task scheduling for heterogeneous-storage spark clusters. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp. 1–9. IEEE (2018) Pan, F., Xiong, J., Shen, Y., Wang, T., Jiang, D.: H-scheduler: storage-aware task scheduling for heterogeneous-storage spark clusters. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp. 1–9. IEEE (2018)
28.
go back to reference Krish, K., Wadhwa, B., Iqbal, M.S., Rafique, M.M., Butt, A.R.: On efficient hierarchical storage for big data processing. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 403–408. IEEE (2016) Krish, K., Wadhwa, B., Iqbal, M.S., Rafique, M.M., Butt, A.R.: On efficient hierarchical storage for big data processing. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 403–408. IEEE (2016)
29.
go back to reference Kambatla, K., Chen, Y.: The truth about mapreduce performance on SSDs. In: 28th Large Installation System Administration Conference (LISA14), pp. 118–126 (2014) Kambatla, K., Chen, Y.: The truth about mapreduce performance on SSDs. In: 28th Large Installation System Administration Conference (LISA14), pp. 118–126 (2014)
30.
go back to reference Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., Rowstron, A.: Migrating server storage to SSDs: analysis of tradeoffs. In: Proceedings of the 4th ACM European Conference on Computer Systems, pp. 145–158 ACM (2009) Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., Rowstron, A.: Migrating server storage to SSDs: analysis of tradeoffs. In: Proceedings of the 4th ACM European Conference on Computer Systems, pp. 145–158 ACM (2009)
31.
go back to reference Kang, S.H., Koo, D.H., Kang, W.H., Lee, S.W.: A case for flash memory SSD in Hadoop applications. Int. J. Control. Autom. 6(1), 201–210 (2013) Kang, S.H., Koo, D.H., Kang, W.H., Lee, S.W.: A case for flash memory SSD in Hadoop applications. Int. J. Control. Autom. 6(1), 201–210 (2013)
32.
go back to reference Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing, pp. 188–196. IEEE (2010) Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing, pp. 188–196. IEEE (2010)
33.
go back to reference Islam, N.S., Lu, X., Wasi-ur Rahman, M., Shankar, D., Panda, D.K.: Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), p. 101. IEEE (2015) Islam, N.S., Lu, X., Wasi-ur Rahman, M., Shankar, D., Panda, D.K.: Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), p. 101. IEEE (2015)
34.
go back to reference Xiong, R., Luo, J., Dong, F.: Optimizing data placement in heterogeneous Hadoop clusters. Clust. Comput. 18(4), 1465–1480 (2015)CrossRef Xiong, R., Luo, J., Dong, F.: Optimizing data placement in heterogeneous Hadoop clusters. Clust. Comput. 18(4), 1465–1480 (2015)CrossRef
35.
go back to reference Coello, C.A.C., Montes, E.M.: Constraint-handling in genetic algorithms through the use of dominance-based tournament selection. Adv. Eng. Inform. 16(3), 193–203 (2002)CrossRef Coello, C.A.C., Montes, E.M.: Constraint-handling in genetic algorithms through the use of dominance-based tournament selection. Adv. Eng. Inform. 16(3), 193–203 (2002)CrossRef
Metadata
Title
Heterogeneity-Aware Data Placement in Hybrid Clouds
Authors
Jack D. Marquez
Juan D. Gonzalez
Oscar H. Mondragon
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-23502-4_13

Premium Partner