nach oben

The VLDB Journal

Erschienen in:

01.12.2014 | Special Issue Paper

SWORD: workload-aware data placement and replica selection for cloud data management systems

verfasst von: K. Ashwin Kumar, Abdul Quamar, Amol Deshpande, Samir Khuller

Erschienen in: The VLDB Journal | Ausgabe 6/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Cloud computing is increasingly being seen as a way to reduce infrastructure costs and add elasticity, and is being used by a wide range of organizations. Cloud data management systems today need to serve a range of different workloads, from analytical read-heavy workloads to transactional (OLTP) workloads. For both the service providers and the users, it is critical to minimize the consumption of resources like CPU, memory, communication bandwidth, and energy, without compromising on service-level agreements if any. In this article, we develop a workload-aware data placement and replication approach, called SWORD, for minimizing resource consumption in such an environment. Specifically, we monitor and model the expected workload as a hypergraph and develop partitioning techniques that minimize the average query span, i.e., the average number of machines involved in the execution of a query or a transaction. We empirically justify the use of query span as the metric to optimize, for both analytical and transactional workloads, and develop a series of replication and data placement algorithms by drawing connections to several well-studied graph theoretic concepts. We introduce a suite of novel techniques to achieve high scalability by reducing the overhead of partitioning and query routing. To deal with workload changes, we propose an incremental repartitioning technique that modifies data placement in small steps without resorting to complete repartitioning. We propose the use of fine-grained quorums defined at the level of groups of data items to control the cost of distributed updates, improve throughput, and adapt to different workloads. We empirically illustrate the benefits of our approach through a comprehensive experimental evaluation for two classes of workloads. For analytical read-only workloads, we show that our techniques result in significant reduction in total resource consumption. For OLTP workloads, we show that our approach improves transaction latencies and overall throughput by minimizing the number of distributed transactions.

Vorheriger Artikel Special section on data-intensive cloud infrastructure

Nächster Artikel ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The Cartesian product of the full set of attributes forming the unique identifier.

A majority of hyperedges in the cut of our compressed hypergraph representing TPC-C, a typical OLTP workload, span two partitions.

It does not include the query routing time.

MLPart: a hypergraph partitioning package. URL http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/Partitioning/MLPart/

hMETIS: a hypergraph partitioning package. URL http://glaros.dtc.umn.edu/gkhome/metis/hmetis/overview

Alon, N., Seymour, P.D., Thomas, R.: A separator theorem for graphs with an excluded minor and its applications. In: STOC (1990)

Alpert, C.J.: The ISPD98 circuit benchmark suite. In: Proc. of Intl, Symposium on Physical Design (1998)

Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily finding a dense subgraph. In: SWAT (1996)

Ayka, C., Cambazoglu, B., Bora, U.: Multi-level direct k-way hypergraph partitioning with multiple constraints and fixed vertices. J. Parallel Distrib. Comput. 68, 609–625 (2008)

Boppana, R.B.: Eigenvalues and graph bisection: an average-case analysis. In: FOCS (1987)

Bruno, N., Chaudhuri, S., Konig, A.C., Narasayya, V.R., Ramamurthy, R., Syamala, M.: Autoadmin project at Microsoft research: lessons learned. IEEE Data Eng. Bull. 34, 12–19 (2011)

Caldwell, A.E., Kahng, A.B., Markov, I.L.: Design and implementation of move-based heuristics for VLSI hypergraph partitioning. J. Exp. Algorithmics. 5, 5 (2000)CrossRef

10.

Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI (2006)

11.

Chowdhury, M., Zaharia, M., Ma, J., Jordan, M.I., Stoica., I.: Managing data transfers in computer clusters with orchestra. In: SIGCOMM (2011)

12.

Curino, C., Zhang, Y., Jones, E.P.C., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3(1), 48–57 (2010)

13.

J. Dittrich, Ruiz, J.A.Q., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). In: PVLDB, vol. 3, pp. 515–529 (2010)

14.

Economou, D., Rivoire, S., Kozyrakis, C.: Full-system power analysis and modeling for server environments. In: MOBS (2006)

15.

Eltabakh, M.Y., Tian, Y., Ozcan, F., Gemulla, R., Krettek, A., McPherson, J.: Cohadoop: flexible data placement and its exploitation in hadoop. PVLDB 4(9), 575–585 (2011)

16.

Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)CrossRefMATHMathSciNet

17.

Feige, U., Kortsarz, G., Peleg, D.: The dense k-subgraph problem. Algorithmica 29, 410–421 (1999)

18.

Ferhatosmanoglu, H., Tosun, A., Ramachandran, A.: Replicated declustering of spatial data. In: PODS (2004)

19.

Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness (1979)

20.

Gray, J., Helland, P., O’Neil, P.E., Shasha, D.: The dangers of replication and a solution. In: SIGMOD (1996)

21.

Ho, L., Wu, J., Liu, P.: Optimal algorithms for cross-rack communication optimization in mapreduce framework. In: CLOUD (2011)

22.

Jones, E.P.C., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD (2010)

23.

Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: Proc. of DAC, pp. 343–348 (1998)

24.

Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: application in VLSI domain. In: IEEE VLSI, pp. 69–529. (1999)

25.

Koyutürk, M., Aykanat, C.: Iterative-improvement-based declustering heuristics for multi-disk databases. Inf. Syst. 30, 47–70 (2005)

26.

Kumar, K.A., Quamar, A., Deshpande, A., Khuller, S.: Workload-Aware Data Placement and Replica Selection for Cloud Data Management Systems. University of Maryland Technical Report (2013)

27.

Lakshman, A., Malik, P.: Cassandra: structured storage system on a P2P network. In: PODC (2009)

28.

Liu, D., Shekhar, S.: Partitioning similarity graphs: a framework for declustering problems. Inf. Syst. 21,475–496 (1996)

29.

Meyerhenke, H., Monien, B., Sauerwald, T.: A new diffusion-based multilevel algorithm for computing graph partitions. J. Parallel Distrib. Comput. 69(9), 750–761 (2009)CrossRef

30.

Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD (2011)

31.

Neves, T., Drummond, L.A., Ochi, L., Albuquerque, C., Uchoa, E.: Solving replica placement and request distribution in content distribution networks. Electron. Notes Discret. Math. 36, 89–96 (2010)CrossRef

32.

Oktay, K.Y., Turk, A., Aykanat, C.: Selective replicated declustering for arbitrary queries. In: Euro-Par (2009)

33.

Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD (2012)

34.

Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD (2009)

35.

Peris, J.R., Martinez, M.P., Alonso, G., Kemme, B.: Are quorums an alternative for data replication? TODS 28(3), 257–294 (2003)CrossRef

36.

Peris, R.J., Martinez, M.P.: How to select a replication protocol according to scalability, availability and communication overhead. In: SRDS (2001)

37.

Peris, R.J., Martnez, M.P., Kemme, B., Alonso, G.: How to select a replication protocol according to scalability, availability, and communication overhead. In: SRDS (2001)

38.

Quamar, A., Kumar, K.A., Deshpande, A.: Sword: scalable workload-aware data placement for transactional workloads. In: EDBT (2013)

39.

Simon, H.D., Teng, S.: How good is recursive bisection? SIAM J. Sci. Comput. 18(5), 1436–1445 (1997)CrossRefMATHMathSciNet

40.

Tatarowicz, A.L., Curino, C., Jones, E.P.C., Madden, S.: Lookup tables: fine-grained partitioning for distributed databases. In: ICDE (2011)

41.

Thain, D., Livny, M.: Building reliable clients and servers. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, Burlington (2003)

42.

Tosun, A.A., Ferhatosmanoglu, H.: Optimal parallel I/O using replication. In: ICPP (1997)

43.

Tosun, A.S.: Replicated declustering for arbitrary queries. In: ACM Symposium on Applied Computing (2004)

44.

Wang, X., Smalter, A., Huan, J., Lushington, G.H.: G-hash: towards fast kernel-based similarity search in large graph databases. In: EDBT (2009)

45.

White, T.: Hadoop: The Definitive Guide. O’Reilly Media, 1st edn. ISBN:0596521979 (2009)

46.

Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. TODS 22, 255–314 (1997)CrossRef

Titel: SWORD: workload-aware data placement and replica selection for cloud data management systems
verfasst von: K. Ashwin Kumar
Abdul Quamar
Amol Deshpande
Samir Khuller
Publikationsdatum: 01.12.2014
Verlag: Springer Berlin Heidelberg
Erschienen in: The VLDB Journal / Ausgabe 6/2014
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI: https://doi.org/10.1007/s00778-014-0362-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 6/2014

Special section on data-intensive cloud infrastructure

ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

Distributed snapshot isolation: global transactions pay globally, local transactions pay locally

The Stratosphere platform for big data analytics

MyBenchmark: generating databases for query workloads

Maximum error-bounded Piecewise Linear Representation for online stream approximation