Top

Cluster Computing

Published in:

28-03-2017

Self-organized dynamic provisioning for big data

Author: D. Cenk Erdil

Published in: Cluster Computing | Issue 3/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Recent rapid expansion of datasets in big data problems has resulted in data sizes that exceed processing capabilities of available distributed computing power. In other words, we are producing more data than we can process. In addition, further analysis of a dataset collective state may require duplicating, transferring, and distributing to increase the scale of the problem. Orchestrating these steps in large-scale complex systems is non-trivial. One basic technique to help minimize effects of data re-distribution is to use dynamic resource provisioning environments. When the node organization and structure is dynamic and eclectic, provisioning environments require up-to-date information about resource availability. Maintaining freshness of available resource state in centralized or hierarchical scheduling systems imposes a network communication overhead. Centralization also introduces administrative barriers, limiting interoperability. One effective method to improve the extent of self-organization is taking feedback. Based on this feedback, nodes can then alter their behavior to better respond to changing characteristics in dynamic resource provisioning environments. In this article, we present a decentralized scheduling framework that takes feedback from the system, and adjusts its behavior accordingly. Our framework presents an enabling mechanism for self-organization, where each cloud node adapts its behavior based on the feedback. This approach, compared to centralized resource provisioning solutions that exist in current cloud systems, achieves comparable scheduling decisions, with half the packet overhead. We show that by taking advantage of spatial locality with dynamic provisioning, and due to better scheduling decisions with our framework, data processing overhead of big data problems can be reduced by at least 30% in general, and up to 55% in particular resource distributions. This in turn, results in efficient scheduling decisions to provision better resources for big data tasks.

previous article Predictive communication modeling for HPC applications

next article Performance analysis and comparison of cellular automata GPU implementations

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Except in the case where nodes self-organize into neighborhoods in a peer-to-peer fashion.

When Freshness is used as the ranking criteria.

Aberer, K., Cudré-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., Schmidt, R.: P-grid: a self-organizing structured p2p system. SIGMOD Rec. 32(3), 29–33 (2003)CrossRef

Berman, F., Fox, G., Hey, A.: Grid Computing: Making the Global Infrastructure a Reality, vol. 2. Wiley, NewYork (2003)CrossRef

Bode, B., Halstead, D., Kendall, R., Lei, Z., Jackson, D.: The portable batch scheduler and the maui scheduler on linux clusters. In: Usenix, 4th Annual Linux Showcase and Conference (2000)

Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)

Chakravarti, A., Baumgartner, G., Lauria, M.: The organic grid: self-organizing computation on a peer-to-peer network. Syst. Man Cybern. A 35(3), 373–384 (2005)CrossRef

Chapin, S.J., Katramatos, D., Karpovich, J., Grimshaw, A.: Resource management in Legion. Future Gener. Comput. Syst. 15(5–6), 583–594 (1999)CrossRef

Chase, J., Irwin, D., Grit, L., Moore, J., Sprenkle, S.: Dynamic virtual clusters in a grid site manager. In: High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium, pp. 90–100 (2003)

Cowie, J., Liu, H., Liu, J., Nicol, D., Ogielski, A.: Towards realistic million-node internet simulations. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (1999)

Czajkowski, K., Fitzgerald, S., Foster, I. and Kesselman, C.: Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10) (2001)

10.

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

11.

Dejun, J., Pierre, G., Chi, C.-H.: Autonomous resource provisioning for multi-service web applications. In: Proceedings of the International World-Wide Web Conference (2010)

12.

Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., Terry D.: Epidemic algorithms for replicated database maintenance. In: PODC ’87: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, pp. 1–12. ACM Press, New York (1987)

13.

Desai, R., Tilak, S., Gandhi, B., Lewis, M. J., Abu-Ghazaleh, N. B.: Analysis of query matching criteria and resource monitoring for grid application scheduling. In: Proceedings of CCGrid2006: IEEE International Symposium on Cluster Computing and the Grid (2006)

14.

Drost, N., Ogston, E., van Nieuwpoort, R.V., Bal, H.E.: Arrg: real-world gossiping. In: Proceedings of the 16th IEEE International Symposium on High Performance Distributed Computing (2007)

15.

Dubois, D.J., Casale, G.: Optispot: minimizing application deployment cost using spot cloud resources. Cluster Comput. 19(2), 893–909 (2016)CrossRef

16.

Epema, D.H.J., Livny, M., van Dantzig, R., Evers, X., Pruyne, J.: A worldwide flock of condors: load sharing among workstation clusters. Technical Report DUT-TWI-95-130, Delft, The Netherlands (1995)

17.

Erdil, D.C., Lewis M.J.: Supporting self-organization for hybrid grid resource scheduling. In: Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 1981–1986. SAC ’08, ACM, New York (2008)

18.

Erdil, D.C., Lewis, M.J.: Grid resource scheduling with gossiping protocols. In: Proceedings of the 7th IEEE International Conference, Peer-to-Peer Computing, Dublin, pp. 193–200 (2007)

19.

Erdil, D.C., Lewis, M.J., Abu-Ghazaleh, N.: An adaptive algorithm for information dissemination in self-organizing grids. In: Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing (eScience 2006), Amsterdam, the Netherlands, 4–6 December (2006)

20.

Fritzke, B.: Growing grid a self-organizing network with constant neighborhood range and adaptation strength. Neural Proc. Lett. 2, 9–13 (1995)CrossRef

21.

Gentzsch, W.: Sun grid engine: towards creating a compute power grid. In: Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium, IEEE, Piscataway, pp. 35–36 (2001)

22.

Goldberg, A.V.: An efficient implementation of a scaling minimum-cost flow algorithm. J. Alg. 22(1), 1–29 (1997)MathSciNetCrossRef

23.

Herodotou H., Lim H., Luo G., Borisov N., Dong L., Cetin, F., Babu, S.: Starfish: a self-tuning system for big data analytics. In: Procceeding of the Fifth CIDR Conference (2011)

24.

Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D., Kania, R., Schaeffer, M., St Pierre, S., et al.: Big data: the future of biocuration. Nature 455(7209), 47–50 (2008)CrossRef

25.

Kempe, D., Kleinberg, J., Demers, A.: Spatial gossip and resource location protocols. In: Annual ACM Symposium on Theory of Computing (STOC) (2001)

26.

Kermarrec, A.-M., Massoulie, L., Ganesh, A.J.: Probabilistic relieable dissemination in large-scale systems. In: IEEE Transactions on Parallel and Distributed Systems (2003)

27.

Lehman, T., Sobieski, J., Jabbari, B.: Dragon: a framework for service provisioning in heterogeneous grid networks. Commun. Mag. IEEE 44(3), 84–90 (2006)CrossRef

28.

Li, L., Halpern, J., Haas, Z.: Gossip-based ad hoc routing. In: IEEE Infocom (2002)

29.

Lynch, C.: Big data: how do your data grow? Nature 455(7209), 28–29 (2008)CrossRef

30.

Marozzo, F., Talia, D., Trunfio, P.: P2p-mapreduce: parallel data processing in dynamic cloud environments. J. Comput. Syst. Sci. 78, 1382–1402 (2012)CrossRef

31.

Murphy, M. A., Kagey, B., Fenn, M., Goasguen, S.: Dynamic provisioning of virtual organization clusters. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID ’09, IEEE Computer Society, Washington, pp. 364–371 (2009)

32.

Nottingham, M., Liu, X.: Amazon elastic compute cloud. http://aws.amazon.com/ec2/

33.

Palanisamy, B., Singh, A., Liu, L., Jain B.: Purlieus: locality-aware resource allocation for mapreduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM (2011)

34.

Park, J., Lee, S., Kim, J.M.: An autonomic control system for high-reliable cps. Cluster Comput. 18(2), 587–598 (2015)CrossRef

35.

Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., Wilde, M.: Falkon: a fast and light-weight task execution framework. In: Supercomputing, 2007. SC’07. Proceedings of the 2007 ACM/IEEE Conference, pp. 1–12. IEEE (2007)

36.

Serugendo, G.D., Karageorgos, A., Rana, O.F., Zambonelli, F.: Engineering self-0rganizing systems: Nature-inspired approaches to software engineering. Lecture Notes in Artificial Intelligence, (2977), Berlin, Germany (2004)

37.

Shen, Z., He, J.: Apache Hadoop Yarn: The Next-Generation Distributed Operating System. In ApacheCon North America, Denver (2014)

38.

Van Essen, B., Hsieh, H., Ames, A., Pearce, R., Gokhale, M.: Di-mmap a scalable memory-map runtime for out-of-core data-intensive applications. Cluster Comput. 18(1), 15–28 (2015)

39.

Vijayakumar, S., Zhu, Q., Agrawal, G.: Dynamic resource provisioning for data streaming applications in a cloud environment. In: 2nd IEEE International Conference on Cloud Computing Technology and Science, (2010)

40.

White, T.: Hadoop: The definitive Guide. O’Reilly Media, Sebastopol (2012)

41.

Yalagandula, P., Dahlin, M.: A Scalable Distributed Information Management System. Proceedings of ACM SIGCOMM, Portland (2004)CrossRef

42.

Zegura, E., Calvert, K.: GT Internetwork Topology Models (GT-ITM). http://www.cc.gatech.edu/projects/gtitm

43.

Zhou, S.: Lsf: Load sharing in large heterogeneous distributed systems. In: I Workshop on Cluster Computing (1992)

Title: Self-organized dynamic provisioning for big data
Author: D. Cenk Erdil
Publication date: 28-03-2017
Publisher: Springer US
Published in: Cluster Computing / Issue 3/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-0822-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2017

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Two level parallelism and I/O reduction in genome comparisons

A secure key agreement protocol for dynamic group

Identifying opinion leaders in social networks with topic limitation

Cloud-based learning system for answer ranking

Parallel GPU-based collision detection of irregular vessel wall for massive particles

Premium Partner