Skip to main content
Top

2018 | OriginalPaper | Chapter

Data Allocation Based on Evolutionary Data Popularity Clustering

Authors : Ralf Vamosi, Mario Lassnig, Erich Schikuta

Published in: Computational Science – ICCS 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This study is motivated by the high-energy physics experiment ATLAS, one of the four major experiments at the Large Hadron Collider at CERN. ATLAS comprises 130 data centers worldwide with datasets in the Petabyte range. In the processing of data across the grid, transfer delays and subsequent performance loss emerged as an issue. The two major costs are the waiting time until input data is ready and the job computation time. In the ATLAS workflows, the input to computational jobs is based on grouped datasets. The waiting time stems mainly from WAN transfers between data centers when job properties require execution at one data center but the dataset is distributed among multiple data centers. The proposed novel data allocation algorithm redistributes the constituent files of datasets such that the job efficiency is increased in terms of a cost metric. An evolutionary algorithm is proposed that addresses the data allocation problem in a network based on data popularity and clustering. The number of expected job’s file transfers is used as the target metric and it is shown that job waiting times can be decreased by faster input data readiness.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Atallah, M.J., Prabhakar, S.: (Almost) optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215. ACM (2000) Atallah, M.J., Prabhakar, S.: (Almost) optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215. ACM (2000)
4.
go back to reference Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.P.: Fast parallel similarity search in multimedia databases. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, SIGMOD 1997, pp. 1–12. ACM, New York (1997). https://doi.org/10.1145/253260.253263 Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.P.: Fast parallel similarity search in multimedia databases. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, SIGMOD 1997, pp. 1–12. ACM, New York (1997). https://​doi.​org/​10.​1145/​253260.​253263
5.
go back to reference Bonacorsi, D., Boccali, T., Giordano, D., Girone, M., Neri, M., Magini, N., Kuznetsov, V., Wildish, T.: Exploiting CMS data popularity to model the evolution of data management for run-2 and beyond. J. Phys.: Conf. Ser. 664, 032003 (2015). IOP Publishing Bonacorsi, D., Boccali, T., Giordano, D., Girone, M., Neri, M., Magini, N., Kuznetsov, V., Wildish, T.: Exploiting CMS data popularity to model the evolution of data management for run-2 and beyond. J. Phys.: Conf. Ser. 664, 032003 (2015). IOP Publishing
6.
go back to reference Campello, R.J.G.B., Hruschka, E.R.: On comparing two sequences of numbers and its applications to clustering analysis. Inf. Sci. 179(8), 1025–1039 (2009)MathSciNetCrossRef Campello, R.J.G.B., Hruschka, E.R.: On comparing two sequences of numbers and its applications to clustering analysis. Inf. Sci. 179(8), 1025–1039 (2009)MathSciNetCrossRef
7.
go back to reference Chang, R.S., Chang, H.P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45(3), 277–295 (2008)MathSciNetCrossRef Chang, R.S., Chang, H.P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45(3), 277–295 (2008)MathSciNetCrossRef
8.
go back to reference Chu, W.W.: Optimal file allocation in a multiple computer system. IEEE Trans. Comput. 100(10), 885–889 (1969)CrossRef Chu, W.W.: Optimal file allocation in a multiple computer system. IEEE Trans. Comput. 100(10), 885–889 (1969)CrossRef
9.
go back to reference Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)CrossRef Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)CrossRef
10.
go back to reference Guo, W., Wang, X.: A data placement strategy based on genetic algorithm in cloud computing platform. In: 2013 10th Web Information System and Application Conference (WISA), pp. 369–372. IEEE (2013) Guo, W., Wang, X.: A data placement strategy based on genetic algorithm in cloud computing platform. In: 2013 10th Web Information System and Application Conference (WISA), pp. 369–372. IEEE (2013)
11.
go back to reference Laning, L.J., Leonard, M.S.: File allocation in a distributed computer communication network. IEEE Trans. Comput. 3, 232–244 (1983)CrossRef Laning, L.J., Leonard, M.S.: File allocation in a distributed computer communication network. IEEE Trans. Comput. 3, 232–244 (1983)CrossRef
12.
go back to reference Lassnig, M., Garonne, V., Branco, M., Molfetas, A.: Dynamic and adaptive data-management in atlas. J. Phys.: Conf. Ser. 219, 062054 (2010). IOP Publishing Lassnig, M., Garonne, V., Branco, M., Molfetas, A.: Dynamic and adaptive data-management in atlas. J. Phys.: Conf. Ser. 219, 062054 (2010). IOP Publishing
13.
go back to reference Megino, F.B., Cinquilli, M., Giordano, D., Karavakis, E., Girone, M., Magini, N., Mancinelli, V., Spiga, D.: Implementing data placement strategies for the CMS experiment based on a popularity model. J. Phys.: Conf. Ser. 396, 032047 (2012). IOP Publishing Megino, F.B., Cinquilli, M., Giordano, D., Karavakis, E., Girone, M., Magini, N., Mancinelli, V., Spiga, D.: Implementing data placement strategies for the CMS experiment based on a popularity model. J. Phys.: Conf. Ser. 396, 032047 (2012). IOP Publishing
14.
go back to reference Ram, S., Marsten, R.E.: A model for database allocation incorporating a concurrency control mechanism. IEEE Trans. Knowl. Data Eng. 3(3), 389–395 (1991)CrossRef Ram, S., Marsten, R.E.: A model for database allocation incorporating a concurrency control mechanism. IEEE Trans. Knowl. Data Eng. 3(3), 389–395 (1991)CrossRef
15.
go back to reference Sato, H., Matsuoka, S., Endo, T.: File clustering based replication algorithm in a grid environment. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 204–211. IEEE Computer Society (2009) Sato, H., Matsuoka, S., Endo, T.: File clustering based replication algorithm in a grid environment. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 204–211. IEEE Computer Society (2009)
16.
go back to reference Sato, H., Matsuoka, S., Endo, T., Maruyama, N.: Access-pattern and bandwidth aware file replication algorithm in a grid environment. In: Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing, pp. 250–257. IEEE Computer Society (2008) Sato, H., Matsuoka, S., Endo, T., Maruyama, N.: Access-pattern and bandwidth aware file replication algorithm in a grid environment. In: Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing, pp. 250–257. IEEE Computer Society (2008)
17.
go back to reference Spiga, D., Giordano, D., Barreiro Megino, F.H.: Optimizing the usage of multi-Petabyte storage resources for LHC experiments. In: Proceedings of the EGI Community Forum 2012/EMI Second Technical Conference (EGICF12-EMITC2), 26–30 March 2012, Munich, Germany (2012). https://pos.sissa.it/162/107/ Spiga, D., Giordano, D., Barreiro Megino, F.H.: Optimizing the usage of multi-Petabyte storage resources for LHC experiments. In: Proceedings of the EGI Community Forum 2012/EMI Second Technical Conference (EGICF12-EMITC2), 26–30 March 2012, Munich, Germany (2012). https://​pos.​sissa.​it/​162/​107/​
18.
go back to reference Wang, J.Y., Jea, K.F.: A near-optimal database allocation for reducing the average waiting time in the grid computing environment. Inf. Sci. 179(21), 3772–3790 (2009)MathSciNetCrossRef Wang, J.Y., Jea, K.F.: A near-optimal database allocation for reducing the average waiting time in the grid computing environment. Inf. Sci. 179(21), 3772–3790 (2009)MathSciNetCrossRef
19.
go back to reference Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gen. Comput. Syst. 26(8), 1200–1214 (2010)CrossRef Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gen. Comput. Syst. 26(8), 1200–1214 (2010)CrossRef
Metadata
Title
Data Allocation Based on Evolutionary Data Popularity Clustering
Authors
Ralf Vamosi
Mario Lassnig
Erich Schikuta
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-93698-7_12

Premium Partner