nach oben

The Journal of Supercomputing

Erschienen in:

01.07.2014

Characterizing and modeling cloud applications/jobs on a Google data center

verfasst von: Sheng Di, Derrick Kondo, Franck Cappello

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks’ simulation errors are \(<\)20 %, confirming a high accuracy of our simulation model.

Vorheriger Artikel Some properties and algorithms for the hyper-torus network

Nächster Artikel Improved extra group network: a new fault-tolerant multistage interconnection network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Scheduling class (0–3), according to [3], roughly represents how latency sensitive a job/task is, with 3 representing a more latency-sensitive task and 0 representing a non-production task.

Google trace does not expose the exact memory size used by jobs but their scaled values compared to the maximum memory capacity of each node. For example, suppose the maximum memory capacity on a host is 64 GB, 0.05 memory size means \(0.05 \times 64=3.2\) GB.

According to Google trace [4], there are different factors for task interruptions: (1) failure event: a task or job was descheduled (or, in rare cases, ceased to be eligible for scheduling while it was pending) due to a task failure; (2) evict event: a task or job was descheduled because of a higher priority task or job, because the scheduler overcommitted and the actual demand exceeded the machine capacity, because the machine on which it was running became unusable, or because a disk holding the task’s data was lost; (3) kill event: a task or job was canceled or another job or task on which this job was dependent died; (4) lost event: a task or job was presumably terminated with a missing record.

Armbrust M, Fox A, Griffith R, Joseph A et al (2009), Above the clouds: a Berkeley view of cloud computing. EECS, University of California, Berkeley, Technical Report. UCB/EECS-2009-28

Vaquero L, Rodero-Merino L, Caceres J, Lindner M (2009) A break in the clouds: towards a cloud definition. SIGCOMM Comput Commun Rev 39(1):50–55CrossRef

Wilkes J (2011) More Google cluster data. Google research blog. http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html

Reiss C, Wilkes J, Hellerstein J (2012) Google cluster-usage traces: format + schema. Google Inc., Mountain View, USA, Technical Report

Di S, Kondo D, Cirne W (2012) Characterization and comparison of cloud versus grid workloads. IEEE international conference on cluster computing (cluster’12), pp 230–238

Meng X, Isci C, Kephart J, Zhang L, Bouillet E, Pendarakis D (2010) Efficient resource provisioning in compute clouds via vm multiplexing. In: Proceedings of the 7th international conference on autonomic computing (ICAC’10), New York, ACM, pp 11–20

Buyya R, Ranjan R, Calheiros R (2010) Intercloud: utility-oriented federation of cloud computing environments for scaling of application services. In: 10th international conference on algorithms and architectures for parallel processing (ICA3PP’10), pp 13–31

Stillwell M, Vivien F, Casanova H (2012) Virtual machine resource allocation for service hosting on heterogeneous distributed platforms. In: Proceedings of IEEE 26th international conference on parallel distributed processing symposium (IPDPS’12), pp 786–797

Calheiros R, Ranjan R, Beloglazov A, De-Rose C, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50CrossRef

10.

Di S, Wang C-L (2013) Dynamic optimization of multi-attribute resource allocation in self-organizing clouds. IEEE Trans Parallel Distrib Syst (TPDS) 24(3):464–478CrossRef

11.

Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: 5th USENIX symposium on operating systems design and implementation (OSDI’04), pp 137–150

12.

Reiss C, Tumanov A, Ganger G, Katz R, Kozuch M (2012) Towards understanding heterogeneous clouds at scale: Google trace analysis. Intel science and technology center for cloud computing. Carnegie Mellon University, Pittsburgh, Technical Report ISTC-CC-TR-12-101

13.

Feitelson D (2011) Workload modeling for computer systems performance evaluation. http://www.cs.huji.ac.il/~feit/wlmod/

14.

Koch R (1997) The 80/20 principle: the secret of achieving more with less. Nicholas Brealey

15.

MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297

16.

Okabe A, Boots B, Sugihara K, Chiu S (2000) Spatial tessellations: concepts and applications of voronoi diagrams, 2nd edn. Series in probability and statistics. Wiley, England

17.

Ross S (2010) Introduction to probability models, 10th edn. Academic Press, BurlingtonMATH

18.

Sharma B, Chudnovsky V, Hellerstein J, Rifaat R, Das C (2011) Modeling and synthesizing task placement constraints in google compute clusters. In: Proceedings of the 2nd ACM symposium on cloud computing (SOCC’11), New York, ACM, pp 3:1–3:14

19.

Mishra A, Hellerstein J, Cirne W, Das C-R (2010) Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform Eval Rev 37(4):34–41CrossRef

20.

Zhang Q, Hellerstein J.L., Boutaba R (2011) Characterizing task usage shapes in google compute clusters. Large scale distributed systems and middleware, workshop (LADIS’11)

21.

Liu Z, Cho S (2012) Characterizing machines and workloads on a Google cluster. In: 8th international workshop on scheduling and resource management for parallel and distributed systems (SRMPDS’12), pp 397–403

22.

Ganapathi A, Chen Y, Fox A, Katz RH, Patterson DA (2010) Statistics-driven workload modeling for the cloud. ICDE workshops’10, pp 87–92

23.

Shvachko K, Kuang H, Radia S, and Chansler R (2010) The hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST’10), pp 1–10

24.

Li A, Zong X, Kandula S, Yang X, Zhang M (2011) Cloudprophet: Towards application performance prediction in cloud. ACM SIGCOMM student poster, pp 426–427

25.

Jackson K.R., Ramakrishnan L, Muriki K at al (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: Proceedings of the IEEE 2nd international conference on cloud computing technology and science (CloudCom’10). Washington, DC, IEEE Computer Society, pp 159–168

26.

Hamerly G, Elkan C (2002) Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the 17th international conference on Information and knowledge management (CIKM’02), New York, ACM, pp 600–607

Titel: Characterizing and modeling cloud applications/jobs on a Google data center
verfasst von: Sheng Di
Derrick Kondo
Franck Cappello
Publikationsdatum: 01.07.2014
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 1/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-014-1131-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 1/2014

A localization algorithm for large scale mobile wireless sensor networks: a learning approach

Two-dimensional patterns and images reconstruction with use of cellular automata

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Improved extra group network: a new fault-tolerant multistage interconnection network

FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms

User subscription-based resource management for Desktop-as-a-Service platforms