Skip to main content
Erschienen in: The Journal of Supercomputing 7/2016

01.07.2016

DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems

verfasst von: Tao Wang, Shihong Yao, Zhengquan Xu, Shan Jia

Erschienen in: The Journal of Supercomputing | Ausgabe 7/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cloud computing systems provide high-performance computing resources and distributed storage space to deal with data-intensive computations. Data scheduling between data centers is becoming indispensable for the cloud computing systems in which a mass of large datasets is stored at different data centers and inter-center data accesses are needed in data analytics. However, the performance of data scheduling is highly dependent upon the rationality of data placement. Data placement is a key optimization method for reducing data scheduling between data centers and realizing statistical I/O load balancing, accordingly reducing the mean computation execution time. This paper proposes a data placement strategy, DCCP, which is developed based on dynamic computation correlation. DCCP places the datasets with high dynamic computation correlations at the same data center considering the I/O load and the capacity load of data centers; when computations are scheduled for this data center, most of the datasets they process are stored locally, and thus the mean computation execution time can be reduced. Evidence from a large number of experiments proves that the DCCP can achieve the statistical I/O load balancing and the capacity load balancing of data centers, thus reducing the total data scheduling between data centers as much as possible at a very low time complexity, even as the numbers of datasets and data centers increase.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zheng ZG, Wang P, Liu J et al (2015) Real-time big data processing framework: challenges and solutions. Appl Math Inf Sci 9(6):2217–2237MathSciNet Zheng ZG, Wang P, Liu J et al (2015) Real-time big data processing framework: challenges and solutions. Appl Math Inf Sci 9(6):2217–2237MathSciNet
2.
Zurück zum Zitat Pan Y, Zhang J (2012) Parallel programming on cloud computing platforms—challenges and solutions. J Converg 3(4):23–28MathSciNet Pan Y, Zhang J (2012) Parallel programming on cloud computing platforms—challenges and solutions. J Converg 3(4):23–28MathSciNet
3.
Zurück zum Zitat Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGRID’08), Lyon, pp 687–692 Deelman E, Chervenak A (2008) Data management challenges of data-intensive scientific workflows. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGRID’08), Lyon, pp 687–692
4.
Zurück zum Zitat Mahajan K, Makroo A, Dahiya D (2013) Round Robin with server affinity: a VM load balancing algorithm for cloud based infrastructure. J Inf Process Syst 9(3):379–394CrossRef Mahajan K, Makroo A, Dahiya D (2013) Round Robin with server affinity: a VM load balancing algorithm for cloud based infrastructure. J Inf Process Syst 9(3):379–394CrossRef
5.
Zurück zum Zitat Li X, Mitton N, Nayak A et al (2012) Achieving load awareness in position-based wireless ad hoc routing. J Converg 3(3):17–22 Li X, Mitton N, Nayak A et al (2012) Achieving load awareness in position-based wireless ad hoc routing. J Converg 3(3):17–22
6.
Zurück zum Zitat Qin X (2008) Performance comparisons of load balancing algorithms for I/O-intensive workloads on clusters. J Netw Comput Appl 31(1):32–46CrossRef Qin X (2008) Performance comparisons of load balancing algorithms for I/O-intensive workloads on clusters. J Netw Comput Appl 31(1):32–46CrossRef
7.
Zurück zum Zitat Qin X, Jiang H, Manzanares A, Ruan X, Yin S (2009) Dynamic load balancing for I/O-intensive applications on clusters. ACM Trans Storage 5(3):9–46CrossRef Qin X, Jiang H, Manzanares A, Ruan X, Yin S (2009) Dynamic load balancing for I/O-intensive applications on clusters. ACM Trans Storage 5(3):9–46CrossRef
8.
Zurück zum Zitat Maguluri ST, Srikant R, Ying L (2012) Stochastic models of load balancing and scheduling in cloud computing clusters. In: Proceedings of the 30th IEEE international conference on computer communications (INFOCOM), Shanghai, pp 702–710 Maguluri ST, Srikant R, Ying L (2012) Stochastic models of load balancing and scheduling in cloud computing clusters. In: Proceedings of the 30th IEEE international conference on computer communications (INFOCOM), Shanghai, pp 702–710
9.
Zurück zum Zitat Goel N, Shyamasundar RK (2012) An executional framework for BPMN using Orc. J Converg 3(1):29–36 Goel N, Shyamasundar RK (2012) An executional framework for BPMN using Orc. J Converg 3(1):29–36
10.
Zurück zum Zitat Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: Proceedings of 24th international conference on distributed computing systems (ICDCS 2004). Keio University, Japan, pp 342–349 Kosar T, Livny M (2004) Stork: making data placement a first class citizen in the grid. In: Proceedings of 24th international conference on distributed computing systems (ICDCS 2004). Keio University, Japan, pp 342–349
11.
Zurück zum Zitat Ahmad I, Karlapalem K, Kwok Y, So S (2002) Evolutionary algorithms for allocating data in distributed database systems. Distrib Parallel Databases 11(1):5–32CrossRefMATH Ahmad I, Karlapalem K, Kwok Y, So S (2002) Evolutionary algorithms for allocating data in distributed database systems. Distrib Parallel Databases 11(1):5–32CrossRefMATH
12.
Zurück zum Zitat Guo J, Wang Y, Tang KS (2008) Evolutionary optimization of file assignment for a large-scale video-on-demand system. IEEE Trans Knowl Data Eng 20(6):836–850CrossRef Guo J, Wang Y, Tang KS (2008) Evolutionary optimization of file assignment for a large-scale video-on-demand system. IEEE Trans Knowl Data Eng 20(6):836–850CrossRef
13.
Zurück zum Zitat Uysal M, Ulus T (2007) A threshold based dynamic data allocation algorithm—a Markov chain model approach. J Appl Sci 7(2):165–174CrossRef Uysal M, Ulus T (2007) A threshold based dynamic data allocation algorithm—a Markov chain model approach. J Appl Sci 7(2):165–174CrossRef
14.
Zurück zum Zitat Brinkmann A, Effert S, Scheideler C (2007) Dynamic and redundant data placement. In: Proceedings of the 27th international conference on distributed computing systems (ICDCS’07), Toronto, pp 29–39 Brinkmann A, Effert S, Scheideler C (2007) Dynamic and redundant data placement. In: Proceedings of the 27th international conference on distributed computing systems (ICDCS’07), Toronto, pp 29–39
15.
Zurück zum Zitat Lee L, Scheuermann P, Vingralek R (2000) File assignment in parallel I/O systems with minimal variance of service time. IEEE Trans Comput 49(2):127–140CrossRef Lee L, Scheuermann P, Vingralek R (2000) File assignment in parallel I/O systems with minimal variance of service time. IEEE Trans Comput 49(2):127–140CrossRef
16.
Zurück zum Zitat Madathil D K, Thota R B, Paul P (2008) A static data placement strategy towards perfect load-balancing for distributed storage clusters. In: Proceedings of the 22nd IEEE international symposium on parallel and distributed processing (IPDPS 2008), Miami, pp 1–8 Madathil D K, Thota R B, Paul P (2008) A static data placement strategy towards perfect load-balancing for distributed storage clusters. In: Proceedings of the 22nd IEEE international symposium on parallel and distributed processing (IPDPS 2008), Miami, pp 1–8
17.
Zurück zum Zitat Park S, Jung IY, Eom H, Yeom HY (2013) An analysis of replication enhancement for a high availability cluster. J Inf Process Syst 9(2):205–216CrossRef Park S, Jung IY, Eom H, Yeom HY (2013) An analysis of replication enhancement for a high availability cluster. J Inf Process Syst 9(2):205–216CrossRef
18.
Zurück zum Zitat Zhu C, Zhu Q, Zuzarte C et al (2013) Developing a dynamic materialized view index for efficiently discovering usable views for progressive queries. J Inf Process Syst 9(4):511–537CrossRef Zhu C, Zhu Q, Zuzarte C et al (2013) Developing a dynamic materialized view index for efficiently discovering usable views for progressive queries. J Inf Process Syst 9(4):511–537CrossRef
19.
Zurück zum Zitat Bohannon P, Fan W, Geerts F (2007) Conditional functional dependencies for data cleaning. In: Proceedings of the 23rd IEEE international conference on data engineering (ICDE2007), Istanbul, pp 746–755 Bohannon P, Fan W, Geerts F (2007) Conditional functional dependencies for data cleaning. In: Proceedings of the 23rd IEEE international conference on data engineering (ICDE2007), Istanbul, pp 746–755
20.
Zurück zum Zitat Geert M, Monique S, Wilfried L (2012) Managing data dependencies in service compositions. J Syst Softw 85(11):2604–2628CrossRef Geert M, Monique S, Wilfried L (2012) Managing data dependencies in service compositions. J Syst Softw 85(11):2604–2628CrossRef
21.
Zurück zum Zitat Doraimani S, Iamnitchi A (2008) File grouping for scientific data management: lessons from experimenting with real traces. In: Proceedings of the 17th ACM international symposium on high performance distributed computing (HPDC-17), Boston, pp 153–164 Doraimani S, Iamnitchi A (2008) File grouping for scientific data management: lessons from experimenting with real traces. In: Proceedings of the 17th ACM international symposium on high performance distributed computing (HPDC-17), Boston, pp 153–164
22.
Zurück zum Zitat Fedak G, He H, Cappello F (2008) BitDew: a programmable environment for large-scale data management and distribution. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing (SC’08), Austin, pp 1–12 Fedak G, He H, Cappello F (2008) BitDew: a programmable environment for large-scale data management and distribution. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing (SC’08), Austin, pp 1–12
23.
Zurück zum Zitat Agarwal S, Dunagan J, Jain N (2010) Volley: automated data placement for geo-distributed cloud services. In: Proceedings of the 7th USENIX symposium on networked systems design and implementation (NSDI’10), San Jose, pp 17–32 Agarwal S, Dunagan J, Jain N (2010) Volley: automated data placement for geo-distributed cloud services. In: Proceedings of the 7th USENIX symposium on networked systems design and implementation (NSDI’10), San Jose, pp 17–32
24.
Zurück zum Zitat Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214CrossRef Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214CrossRef
25.
Zurück zum Zitat Zheng P, Cui L, Wang H, Xu M (2010) A data placement strategy for data-intensive applications in Cloud. Chin J Comput 33(8):1472–1480CrossRef Zheng P, Cui L, Wang H, Xu M (2010) A data placement strategy for data-intensive applications in Cloud. Chin J Comput 33(8):1472–1480CrossRef
26.
Zurück zum Zitat Nukarapu DT, Bin T, Wang L (2011) Data replication in data intensive scientific applications with performance guarantee. IEEE Trans Parallel Distrib Syst 22(8):1299–1306CrossRef Nukarapu DT, Bin T, Wang L (2011) Data replication in data intensive scientific applications with performance guarantee. IEEE Trans Parallel Distrib Syst 22(8):1299–1306CrossRef
27.
Zurück zum Zitat Kosar T, Livny M (2005) A framework for reliable and efficient data placement in distributed computing systems. J Parallel Distrib Comput 65:1146–1157CrossRef Kosar T, Livny M (2005) A framework for reliable and efficient data placement in distributed computing systems. J Parallel Distrib Comput 65:1146–1157CrossRef
28.
Zurück zum Zitat Ranganathan K, Foster I (2002) Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of 11th IEEE international symposium on high performance distributed computing (HPDC-11), Edinburgh, pp 352–358 Ranganathan K, Foster I (2002) Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of 11th IEEE international symposium on high performance distributed computing (HPDC-11), Edinburgh, pp 352–358
29.
Zurück zum Zitat Jeong D, Ji S-Y, Suma EA et al (2015) Designing a collaborative visual analytics system to support users’ continuous analytical processes. Human-centric Comput Inf Sci 5(5):1–20 Jeong D, Ji S-Y, Suma EA et al (2015) Designing a collaborative visual analytics system to support users’ continuous analytical processes. Human-centric Comput Inf Sci 5(5):1–20
30.
Zurück zum Zitat Kim H, Lee S-H, Sohn M-K et al (2014) Illumination invariant head pose estimation using random forests classifier and binary pattern run length matrix. Human-centric Comput Inf Sci 4:9CrossRef Kim H, Lee S-H, Sohn M-K et al (2014) Illumination invariant head pose estimation using random forests classifier and binary pattern run length matrix. Human-centric Comput Inf Sci 4:9CrossRef
31.
Zurück zum Zitat Li R, Feng W, Wang H (2014) A new parameter estimation method for a zipf-like distribution for geospatial data access. ETRI J 36(1):134–140MathSciNetCrossRef Li R, Feng W, Wang H (2014) A new parameter estimation method for a zipf-like distribution for geospatial data access. ETRI J 36(1):134–140MathSciNetCrossRef
32.
Zurück zum Zitat Albayram Y, Khan MMH, Bamis A et al (2015) Designing challenge questions for location-based authentication systems: a real-life study. Human-centric Comput Inf Sci 5:17CrossRef Albayram Y, Khan MMH, Bamis A et al (2015) Designing challenge questions for location-based authentication systems: a real-life study. Human-centric Comput Inf Sci 5:17CrossRef
33.
Zurück zum Zitat Li R, Zhang Y, Xu Z (2013) A Load-balancing method for network GISs in a heterogeneous cluster-based system using access density. Future Gener Comput Syst 29(22):528–535CrossRef Li R, Zhang Y, Xu Z (2013) A Load-balancing method for network GISs in a heterogeneous cluster-based system using access density. Future Gener Comput Syst 29(22):528–535CrossRef
Metadaten
Titel
DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems
verfasst von
Tao Wang
Shihong Yao
Zhengquan Xu
Shan Jia
Publikationsdatum
01.07.2016
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 7/2016
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-015-1511-z

Weitere Artikel der Ausgabe 7/2016

The Journal of Supercomputing 7/2016 Zur Ausgabe