Skip to main content

2018 | OriginalPaper | Buchkapitel

Data Allocation Based on Evolutionary Data Popularity Clustering

verfasst von : Ralf Vamosi, Mario Lassnig, Erich Schikuta

Erschienen in: Computational Science – ICCS 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This study is motivated by the high-energy physics experiment ATLAS, one of the four major experiments at the Large Hadron Collider at CERN. ATLAS comprises 130 data centers worldwide with datasets in the Petabyte range. In the processing of data across the grid, transfer delays and subsequent performance loss emerged as an issue. The two major costs are the waiting time until input data is ready and the job computation time. In the ATLAS workflows, the input to computational jobs is based on grouped datasets. The waiting time stems mainly from WAN transfers between data centers when job properties require execution at one data center but the dataset is distributed among multiple data centers. The proposed novel data allocation algorithm redistributes the constituent files of datasets such that the job efficiency is increased in terms of a cost metric. An evolutionary algorithm is proposed that addresses the data allocation problem in a network based on data popularity and clustering. The number of expected job’s file transfers is used as the target metric and it is shown that job waiting times can be decreased by faster input data readiness.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Atallah, M.J., Prabhakar, S.: (Almost) optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215. ACM (2000) Atallah, M.J., Prabhakar, S.: (Almost) optimal parallel block access to range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 205–215. ACM (2000)
4.
Zurück zum Zitat Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.P.: Fast parallel similarity search in multimedia databases. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, SIGMOD 1997, pp. 1–12. ACM, New York (1997). https://doi.org/10.1145/253260.253263 Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.P.: Fast parallel similarity search in multimedia databases. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, SIGMOD 1997, pp. 1–12. ACM, New York (1997). https://​doi.​org/​10.​1145/​253260.​253263
5.
Zurück zum Zitat Bonacorsi, D., Boccali, T., Giordano, D., Girone, M., Neri, M., Magini, N., Kuznetsov, V., Wildish, T.: Exploiting CMS data popularity to model the evolution of data management for run-2 and beyond. J. Phys.: Conf. Ser. 664, 032003 (2015). IOP Publishing Bonacorsi, D., Boccali, T., Giordano, D., Girone, M., Neri, M., Magini, N., Kuznetsov, V., Wildish, T.: Exploiting CMS data popularity to model the evolution of data management for run-2 and beyond. J. Phys.: Conf. Ser. 664, 032003 (2015). IOP Publishing
6.
Zurück zum Zitat Campello, R.J.G.B., Hruschka, E.R.: On comparing two sequences of numbers and its applications to clustering analysis. Inf. Sci. 179(8), 1025–1039 (2009)MathSciNetCrossRef Campello, R.J.G.B., Hruschka, E.R.: On comparing two sequences of numbers and its applications to clustering analysis. Inf. Sci. 179(8), 1025–1039 (2009)MathSciNetCrossRef
7.
Zurück zum Zitat Chang, R.S., Chang, H.P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45(3), 277–295 (2008)MathSciNetCrossRef Chang, R.S., Chang, H.P.: A dynamic data replication strategy using access-weights in data grids. J. Supercomput. 45(3), 277–295 (2008)MathSciNetCrossRef
8.
Zurück zum Zitat Chu, W.W.: Optimal file allocation in a multiple computer system. IEEE Trans. Comput. 100(10), 885–889 (1969)CrossRef Chu, W.W.: Optimal file allocation in a multiple computer system. IEEE Trans. Comput. 100(10), 885–889 (1969)CrossRef
9.
Zurück zum Zitat Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)CrossRef Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001)CrossRef
10.
Zurück zum Zitat Guo, W., Wang, X.: A data placement strategy based on genetic algorithm in cloud computing platform. In: 2013 10th Web Information System and Application Conference (WISA), pp. 369–372. IEEE (2013) Guo, W., Wang, X.: A data placement strategy based on genetic algorithm in cloud computing platform. In: 2013 10th Web Information System and Application Conference (WISA), pp. 369–372. IEEE (2013)
11.
Zurück zum Zitat Laning, L.J., Leonard, M.S.: File allocation in a distributed computer communication network. IEEE Trans. Comput. 3, 232–244 (1983)CrossRef Laning, L.J., Leonard, M.S.: File allocation in a distributed computer communication network. IEEE Trans. Comput. 3, 232–244 (1983)CrossRef
12.
Zurück zum Zitat Lassnig, M., Garonne, V., Branco, M., Molfetas, A.: Dynamic and adaptive data-management in atlas. J. Phys.: Conf. Ser. 219, 062054 (2010). IOP Publishing Lassnig, M., Garonne, V., Branco, M., Molfetas, A.: Dynamic and adaptive data-management in atlas. J. Phys.: Conf. Ser. 219, 062054 (2010). IOP Publishing
13.
Zurück zum Zitat Megino, F.B., Cinquilli, M., Giordano, D., Karavakis, E., Girone, M., Magini, N., Mancinelli, V., Spiga, D.: Implementing data placement strategies for the CMS experiment based on a popularity model. J. Phys.: Conf. Ser. 396, 032047 (2012). IOP Publishing Megino, F.B., Cinquilli, M., Giordano, D., Karavakis, E., Girone, M., Magini, N., Mancinelli, V., Spiga, D.: Implementing data placement strategies for the CMS experiment based on a popularity model. J. Phys.: Conf. Ser. 396, 032047 (2012). IOP Publishing
14.
Zurück zum Zitat Ram, S., Marsten, R.E.: A model for database allocation incorporating a concurrency control mechanism. IEEE Trans. Knowl. Data Eng. 3(3), 389–395 (1991)CrossRef Ram, S., Marsten, R.E.: A model for database allocation incorporating a concurrency control mechanism. IEEE Trans. Knowl. Data Eng. 3(3), 389–395 (1991)CrossRef
15.
Zurück zum Zitat Sato, H., Matsuoka, S., Endo, T.: File clustering based replication algorithm in a grid environment. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 204–211. IEEE Computer Society (2009) Sato, H., Matsuoka, S., Endo, T.: File clustering based replication algorithm in a grid environment. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 204–211. IEEE Computer Society (2009)
16.
Zurück zum Zitat Sato, H., Matsuoka, S., Endo, T., Maruyama, N.: Access-pattern and bandwidth aware file replication algorithm in a grid environment. In: Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing, pp. 250–257. IEEE Computer Society (2008) Sato, H., Matsuoka, S., Endo, T., Maruyama, N.: Access-pattern and bandwidth aware file replication algorithm in a grid environment. In: Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing, pp. 250–257. IEEE Computer Society (2008)
17.
Zurück zum Zitat Spiga, D., Giordano, D., Barreiro Megino, F.H.: Optimizing the usage of multi-Petabyte storage resources for LHC experiments. In: Proceedings of the EGI Community Forum 2012/EMI Second Technical Conference (EGICF12-EMITC2), 26–30 March 2012, Munich, Germany (2012). https://pos.sissa.it/162/107/ Spiga, D., Giordano, D., Barreiro Megino, F.H.: Optimizing the usage of multi-Petabyte storage resources for LHC experiments. In: Proceedings of the EGI Community Forum 2012/EMI Second Technical Conference (EGICF12-EMITC2), 26–30 March 2012, Munich, Germany (2012). https://​pos.​sissa.​it/​162/​107/​
18.
Zurück zum Zitat Wang, J.Y., Jea, K.F.: A near-optimal database allocation for reducing the average waiting time in the grid computing environment. Inf. Sci. 179(21), 3772–3790 (2009)MathSciNetCrossRef Wang, J.Y., Jea, K.F.: A near-optimal database allocation for reducing the average waiting time in the grid computing environment. Inf. Sci. 179(21), 3772–3790 (2009)MathSciNetCrossRef
19.
Zurück zum Zitat Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gen. Comput. Syst. 26(8), 1200–1214 (2010)CrossRef Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gen. Comput. Syst. 26(8), 1200–1214 (2010)CrossRef
Metadaten
Titel
Data Allocation Based on Evolutionary Data Popularity Clustering
verfasst von
Ralf Vamosi
Mario Lassnig
Erich Schikuta
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93698-7_12