Skip to main content
Erschienen in: The Journal of Supercomputing 1/2014

01.07.2014

Characterizing and modeling cloud applications/jobs on a Google data center

verfasst von: Sheng Di, Derrick Kondo, Franck Cappello

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks’ simulation errors are \(<\)20 %, confirming a high accuracy of our simulation model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Scheduling class (0–3), according to [3], roughly represents how latency sensitive a job/task is, with 3 representing a more latency-sensitive task and 0 representing a non-production task.
 
2
Google trace does not expose the exact memory size used by jobs but their scaled values compared to the maximum memory capacity of each node. For example, suppose the maximum memory capacity on a host is 64 GB, 0.05 memory size means \(0.05 \times 64=3.2\) GB.
 
3
According to Google trace [4], there are different factors for task interruptions: (1) failure event: a task or job was descheduled (or, in rare cases, ceased to be eligible for scheduling while it was pending) due to a task failure; (2) evict event: a task or job was descheduled because of a higher priority task or job, because the scheduler overcommitted and the actual demand exceeded the machine capacity, because the machine on which it was running became unusable, or because a disk holding the task’s data was lost; (3) kill event: a task or job was canceled or another job or task on which this job was dependent died; (4) lost event: a task or job was presumably terminated with a missing record.
 
Literatur
1.
Zurück zum Zitat Armbrust M, Fox A, Griffith R, Joseph A et al (2009), Above the clouds: a Berkeley view of cloud computing. EECS, University of California, Berkeley, Technical Report. UCB/EECS-2009-28 Armbrust M, Fox A, Griffith R, Joseph A et al (2009), Above the clouds: a Berkeley view of cloud computing. EECS, University of California, Berkeley, Technical Report. UCB/EECS-2009-28
2.
Zurück zum Zitat Vaquero L, Rodero-Merino L, Caceres J, Lindner M (2009) A break in the clouds: towards a cloud definition. SIGCOMM Comput Commun Rev 39(1):50–55CrossRef Vaquero L, Rodero-Merino L, Caceres J, Lindner M (2009) A break in the clouds: towards a cloud definition. SIGCOMM Comput Commun Rev 39(1):50–55CrossRef
4.
Zurück zum Zitat Reiss C, Wilkes J, Hellerstein J (2012) Google cluster-usage traces: format + schema. Google Inc., Mountain View, USA, Technical Report Reiss C, Wilkes J, Hellerstein J (2012) Google cluster-usage traces: format + schema. Google Inc., Mountain View, USA, Technical Report
5.
Zurück zum Zitat Di S, Kondo D, Cirne W (2012) Characterization and comparison of cloud versus grid workloads. IEEE international conference on cluster computing (cluster’12), pp 230–238 Di S, Kondo D, Cirne W (2012) Characterization and comparison of cloud versus grid workloads. IEEE international conference on cluster computing (cluster’12), pp 230–238
6.
Zurück zum Zitat Meng X, Isci C, Kephart J, Zhang L, Bouillet E, Pendarakis D (2010) Efficient resource provisioning in compute clouds via vm multiplexing. In: Proceedings of the 7th international conference on autonomic computing (ICAC’10), New York, ACM, pp 11–20 Meng X, Isci C, Kephart J, Zhang L, Bouillet E, Pendarakis D (2010) Efficient resource provisioning in compute clouds via vm multiplexing. In: Proceedings of the 7th international conference on autonomic computing (ICAC’10), New York, ACM, pp 11–20
7.
Zurück zum Zitat Buyya R, Ranjan R, Calheiros R (2010) Intercloud: utility-oriented federation of cloud computing environments for scaling of application services. In: 10th international conference on algorithms and architectures for parallel processing (ICA3PP’10), pp 13–31 Buyya R, Ranjan R, Calheiros R (2010) Intercloud: utility-oriented federation of cloud computing environments for scaling of application services. In: 10th international conference on algorithms and architectures for parallel processing (ICA3PP’10), pp 13–31
8.
Zurück zum Zitat Stillwell M, Vivien F, Casanova H (2012) Virtual machine resource allocation for service hosting on heterogeneous distributed platforms. In: Proceedings of IEEE 26th international conference on parallel distributed processing symposium (IPDPS’12), pp 786–797 Stillwell M, Vivien F, Casanova H (2012) Virtual machine resource allocation for service hosting on heterogeneous distributed platforms. In: Proceedings of IEEE 26th international conference on parallel distributed processing symposium (IPDPS’12), pp 786–797
9.
Zurück zum Zitat Calheiros R, Ranjan R, Beloglazov A, De-Rose C, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50CrossRef Calheiros R, Ranjan R, Beloglazov A, De-Rose C, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50CrossRef
10.
Zurück zum Zitat Di S, Wang C-L (2013) Dynamic optimization of multi-attribute resource allocation in self-organizing clouds. IEEE Trans Parallel Distrib Syst (TPDS) 24(3):464–478CrossRef Di S, Wang C-L (2013) Dynamic optimization of multi-attribute resource allocation in self-organizing clouds. IEEE Trans Parallel Distrib Syst (TPDS) 24(3):464–478CrossRef
11.
Zurück zum Zitat Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: 5th USENIX symposium on operating systems design and implementation (OSDI’04), pp 137–150 Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: 5th USENIX symposium on operating systems design and implementation (OSDI’04), pp 137–150
12.
Zurück zum Zitat Reiss C, Tumanov A, Ganger G, Katz R, Kozuch M (2012) Towards understanding heterogeneous clouds at scale: Google trace analysis. Intel science and technology center for cloud computing. Carnegie Mellon University, Pittsburgh, Technical Report ISTC-CC-TR-12-101 Reiss C, Tumanov A, Ganger G, Katz R, Kozuch M (2012) Towards understanding heterogeneous clouds at scale: Google trace analysis. Intel science and technology center for cloud computing. Carnegie Mellon University, Pittsburgh, Technical Report ISTC-CC-TR-12-101
14.
Zurück zum Zitat Koch R (1997) The 80/20 principle: the secret of achieving more with less. Nicholas Brealey Koch R (1997) The 80/20 principle: the secret of achieving more with less. Nicholas Brealey
15.
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
16.
Zurück zum Zitat Okabe A, Boots B, Sugihara K, Chiu S (2000) Spatial tessellations: concepts and applications of voronoi diagrams, 2nd edn. Series in probability and statistics. Wiley, England Okabe A, Boots B, Sugihara K, Chiu S (2000) Spatial tessellations: concepts and applications of voronoi diagrams, 2nd edn. Series in probability and statistics. Wiley, England
17.
Zurück zum Zitat Ross S (2010) Introduction to probability models, 10th edn. Academic Press, BurlingtonMATH Ross S (2010) Introduction to probability models, 10th edn. Academic Press, BurlingtonMATH
18.
Zurück zum Zitat Sharma B, Chudnovsky V, Hellerstein J, Rifaat R, Das C (2011) Modeling and synthesizing task placement constraints in google compute clusters. In: Proceedings of the 2nd ACM symposium on cloud computing (SOCC’11), New York, ACM, pp 3:1–3:14 Sharma B, Chudnovsky V, Hellerstein J, Rifaat R, Das C (2011) Modeling and synthesizing task placement constraints in google compute clusters. In: Proceedings of the 2nd ACM symposium on cloud computing (SOCC’11), New York, ACM, pp 3:1–3:14
19.
Zurück zum Zitat Mishra A, Hellerstein J, Cirne W, Das C-R (2010) Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform Eval Rev 37(4):34–41CrossRef Mishra A, Hellerstein J, Cirne W, Das C-R (2010) Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform Eval Rev 37(4):34–41CrossRef
20.
Zurück zum Zitat Zhang Q, Hellerstein J.L., Boutaba R (2011) Characterizing task usage shapes in google compute clusters. Large scale distributed systems and middleware, workshop (LADIS’11) Zhang Q, Hellerstein J.L., Boutaba R (2011) Characterizing task usage shapes in google compute clusters. Large scale distributed systems and middleware, workshop (LADIS’11)
21.
Zurück zum Zitat Liu Z, Cho S (2012) Characterizing machines and workloads on a Google cluster. In: 8th international workshop on scheduling and resource management for parallel and distributed systems (SRMPDS’12), pp 397–403 Liu Z, Cho S (2012) Characterizing machines and workloads on a Google cluster. In: 8th international workshop on scheduling and resource management for parallel and distributed systems (SRMPDS’12), pp 397–403
22.
Zurück zum Zitat Ganapathi A, Chen Y, Fox A, Katz RH, Patterson DA (2010) Statistics-driven workload modeling for the cloud. ICDE workshops’10, pp 87–92 Ganapathi A, Chen Y, Fox A, Katz RH, Patterson DA (2010) Statistics-driven workload modeling for the cloud. ICDE workshops’10, pp 87–92
23.
Zurück zum Zitat Shvachko K, Kuang H, Radia S, and Chansler R (2010) The hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST’10), pp 1–10 Shvachko K, Kuang H, Radia S, and Chansler R (2010) The hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST’10), pp 1–10
24.
Zurück zum Zitat Li A, Zong X, Kandula S, Yang X, Zhang M (2011) Cloudprophet: Towards application performance prediction in cloud. ACM SIGCOMM student poster, pp 426–427 Li A, Zong X, Kandula S, Yang X, Zhang M (2011) Cloudprophet: Towards application performance prediction in cloud. ACM SIGCOMM student poster, pp 426–427
25.
Zurück zum Zitat Jackson K.R., Ramakrishnan L, Muriki K at al (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: Proceedings of the IEEE 2nd international conference on cloud computing technology and science (CloudCom’10). Washington, DC, IEEE Computer Society, pp 159–168 Jackson K.R., Ramakrishnan L, Muriki K at al (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: Proceedings of the IEEE 2nd international conference on cloud computing technology and science (CloudCom’10). Washington, DC, IEEE Computer Society, pp 159–168
26.
Zurück zum Zitat Hamerly G, Elkan C (2002) Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the 17th international conference on Information and knowledge management (CIKM’02), New York, ACM, pp 600–607 Hamerly G, Elkan C (2002) Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the 17th international conference on Information and knowledge management (CIKM’02), New York, ACM, pp 600–607
Metadaten
Titel
Characterizing and modeling cloud applications/jobs on a Google data center
verfasst von
Sheng Di
Derrick Kondo
Franck Cappello
Publikationsdatum
01.07.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1131-z

Weitere Artikel der Ausgabe 1/2014

The Journal of Supercomputing 1/2014 Zur Ausgabe