Skip to main content
Top
Published in: The Journal of Supercomputing 10/2017

06-04-2017

A partitioning framework for Cassandra NoSQL database using Rendezvous hashing

Authors: Sally M. Elghamrawy, Aboul Ella Hassanien

Published in: The Journal of Supercomputing | Issue 10/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Due to the gradual expansion in data volume used in social networks and cloud computing, the term “Big data” has appeared with its challenges to store the immense datasets. Many tools and algorithms appeared to handle the challenges of storing big data. NoSQL databases, such as Cassandra and MongoDB, are designed with a novel data management system that can handle and process huge volumes of data. Partitioning data in NoSQL databases is considered one of the critical challenges in database design. In this paper, a MapReduce Rendezvous Hashing-Based Virtual Hierarchies (MR-RHVH) framework is proposed for scalable partitioning of Cassandra NoSQL database. The MapReduce framework is used to implement MR-RHVH on Cassandra to enhance its performance in highly distributed environments. MR-RHVH distributes the nodes to rendezvous regions based on a proposed Adopted Virtual Hierarchies strategy. Each region is responsible for a set of nodes. In addition, a proposed bloom filter evaluator is used to ensure the accurate allocation of keys to nodes in each region. Moreover, a number of experiments were performed to evaluate the performance of MR-RHVH framework, using YCSB for database benchmarking. The results show high scalability rate and less time consuming for MR-RHVH framework over different recent systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516CrossRef Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516CrossRef
2.
go back to reference \(10-K\) Annual Report. SEC Filings. Facebook. 28 Jan 2016. Retrieved 26 Mar 2016 \(10-K\) Annual Report. SEC Filings. Facebook. 28 Jan 2016. Retrieved 26 Mar 2016
3.
go back to reference Agrawal R, Ailamaki A, Bernstein PA, Brewer EA, Carey MJ, Chaudhuri S et al (2008) The Claremont report on database research. SIGMOD Rec 37(3):9–19CrossRef Agrawal R, Ailamaki A, Bernstein PA, Brewer EA, Carey MJ, Chaudhuri S et al (2008) The Claremont report on database research. SIGMOD Rec 37(3):9–19CrossRef
4.
go back to reference Cruz F, Maia F, Matos M, Oliveira R, Paulo Ja, Pereira J, Vilaça R (2013) MeT: Workload aware elasticity for NoSQL. In: Proceeding EuroSys ’13 Proceedings of the 8th ACM European Conference on Computer Systems, New York, NY, USA, pp 183–196 Cruz F, Maia F, Matos M, Oliveira R, Paulo Ja, Pereira J, Vilaça R (2013) MeT: Workload aware elasticity for NoSQL. In: Proceeding EuroSys ’13 Proceedings of the 8th ACM European Conference on Computer Systems, New York, NY, USA, pp 183–196
6.
go back to reference Chodorow K, Dirolf M (2010) MongoDB: the definitive guide, 1st edn, O’Reilly Media, p 216, ISBN 978-1-4493-8156-1 Chodorow K, Dirolf M (2010) MongoDB: the definitive guide, 1st edn, O’Reilly Media, p 216, ISBN 978-1-4493-8156-1
7.
go back to reference Chang F, Dean J, Ghemawat S, Hsieh WC et al (2008) BigTable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) J 26(2):205–218 Chang F, Dean J, Ghemawat S, Hsieh WC et al (2008) BigTable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) J 26(2):205–218
8.
go back to reference DeCandia G, Hastorun D, Jampani M et al (2007) Dynamo: Amazon’s highly available key-value storeC. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles, SOSP 2007, 205–220, Stevenson, Washington, USA, October 14–17 DeCandia G, Hastorun D, Jampani M et al (2007) Dynamo: Amazon’s highly available key-value storeC. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles, SOSP 2007, 205–220, Stevenson, Washington, USA, October 14–17
9.
go back to reference Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. Oper Syst Rev 44(2):35–40CrossRef Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. Oper Syst Rev 44(2):35–40CrossRef
10.
go back to reference Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world-wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, ’97, ACM, New York, NY, USA, pp 654–663 Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world-wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, ’97, ACM, New York, NY, USA, pp 654–663
11.
go back to reference Chen Z, Yang S, Tan S, Zhang G, Yang H (2013) Hybrid range consistent hash partitioning strategy—a new data partition strategy for NoSQL database. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2013, IEEE, pp 1161–1169 Chen Z, Yang S, Tan S, Zhang G, Yang H (2013) Hybrid range consistent hash partitioning strategy—a new data partition strategy for NoSQL database. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2013, IEEE, pp 1161–1169
12.
go back to reference Turk A, Selvitopi RO, Ferhatosmanoglu H, Aykanat C (2014) Temporal workload-aware replicated partitioning for social networks. IEEE Trans Knowl Data Eng 26(11):2832–2845CrossRef Turk A, Selvitopi RO, Ferhatosmanoglu H, Aykanat C (2014) Temporal workload-aware replicated partitioning for social networks. IEEE Trans Knowl Data Eng 26(11):2832–2845CrossRef
13.
go back to reference Huang X, Wang J, Zhong Y, Song S, Yu PS (2015) Optimizing data partition for scaling out NoSQL cluster. Concur Comput: Pract Exp 27(18):5793–5809CrossRef Huang X, Wang J, Zhong Y, Song S, Yu PS (2015) Optimizing data partition for scaling out NoSQL cluster. Concur Comput: Pract Exp 27(18):5793–5809CrossRef
14.
go back to reference Schall D, Härder T (2015) Dynamic physiological partitioning on a shared-nothing database Cluster. In: IEEE 31st International Conference on Data Engineering (ICDE), 2015, IEEE, pp 1095–1106 Schall D, Härder T (2015) Dynamic physiological partitioning on a shared-nothing database Cluster. In: IEEE 31st International Conference on Data Engineering (ICDE), 2015, IEEE, pp 1095–1106
15.
go back to reference Yao Z, Ravishankar CV, Tripathi S (2001) Hash-based virtual hierarchies for caching in hybrid content-delivery networks. The University of California, Riverside, Department of Computer Science and Engineering, California Yao Z, Ravishankar CV, Tripathi S (2001) Hash-based virtual hierarchies for caching in hybrid content-delivery networks. The University of California, Riverside, Department of Computer Science and Engineering, California
16.
go back to reference Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud computing, ACM, pp 143–154 Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud computing, ACM, pp 143–154
17.
go back to reference Abramova V, Bernardino J, Furtado P (2014) Testing cloud benchmark scalability with cassandra. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 434–441 Abramova V, Bernardino J, Furtado P (2014) Testing cloud benchmark scalability with cassandra. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 434–441
18.
go back to reference Srinivasan L, Varma V (2015) Adaptive load-balancing for consistent hashing in heterogeneous clusters. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015, IEEE, pp 1135–1138 Srinivasan L, Varma V (2015) Adaptive load-balancing for consistent hashing in heterogeneous clusters. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015, IEEE, pp 1135–1138
19.
go back to reference Wang X, Loguinov D (2007) Load-balancing performance of consistent hashing: asymptotic analysis of random node join. IEEE/ACM Trans Netw 15(4):892–905CrossRef Wang X, Loguinov D (2007) Load-balancing performance of consistent hashing: asymptotic analysis of random node join. IEEE/ACM Trans Netw 15(4):892–905CrossRef
20.
go back to reference Dede E, Sendir B, Kuzlu P, Weachock J, Govindaraju M, Ramakrishnan L (2016) Processing Cassandra datasets with Hadoop-streaming based approaches. IEEE Trans Serv Comput 9(1):46–58CrossRef Dede E, Sendir B, Kuzlu P, Weachock J, Govindaraju M, Ramakrishnan L (2016) Processing Cassandra datasets with Hadoop-streaming based approaches. IEEE Trans Serv Comput 9(1):46–58CrossRef
21.
go back to reference Kuhlenkamp J, Klems M, Röss O (2014) Benchmarking scalability and elasticity of distributed database systems. Proc VLDB Endow 7(12):1219–1230CrossRef Kuhlenkamp J, Klems M, Röss O (2014) Benchmarking scalability and elasticity of distributed database systems. Proc VLDB Endow 7(12):1219–1230CrossRef
23.
go back to reference Thaler DG, Ravishankar CV (1998) Using name-based mappings to increase hit rates. IEEE/ACM Trans Netw (TON) 6(1):1–14CrossRef Thaler DG, Ravishankar CV (1998) Using name-based mappings to increase hit rates. IEEE/ACM Trans Netw (TON) 6(1):1–14CrossRef
24.
go back to reference Seada K, Helmy A (2004) Rendezvous regions: a scalable architecture for service location and data-centric storage in large-scale wireless networks. In: Proceedings of the 18th International on Parallel and Distributed Processing Symposium, 2004, IEEE, p 218 Seada K, Helmy A (2004) Rendezvous regions: a scalable architecture for service location and data-centric storage in large-scale wireless networks. In: Proceedings of the 18th International on Parallel and Distributed Processing Symposium, 2004, IEEE, p 218
25.
go back to reference Kurihara Yuki (2015) Digest::MurmurHash. GitHub.com. Retrieved 18 Mar 2015 Kurihara Yuki (2015) Digest::MurmurHash. GitHub.com. Retrieved 18 Mar 2015
26.
go back to reference Jenkins B (2012) SpookyHash: a 128-bit noncryptographic hash Jenkins B (2012) SpookyHash: a 128-bit noncryptographic hash
27.
go back to reference Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426CrossRefMATH Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426CrossRefMATH
28.
go back to reference Li Zhe, Ross Kenneth A (1995) Perf join: an alternative to two-way semijoin and bloomjoin. In: CIKM ’95: Proceedings of the 4th International Conference on Information and Knowledge Management, pp 137–144, 1995 Li Zhe, Ross Kenneth A (1995) Perf join: an alternative to two-way semijoin and bloomjoin. In: CIKM ’95: Proceedings of the 4th International Conference on Information and Knowledge Management, pp 137–144, 1995
29.
go back to reference Bringer J, Morel C, Rathgeb C (2015) Security analysis of bloom filter-based iris biometric template protection. In: International Conference on Biometrics (ICB), 2015, IEEE, pp 527–534 Bringer J, Morel C, Rathgeb C (2015) Security analysis of bloom filter-based iris biometric template protection. In: International Conference on Biometrics (ICB), 2015, IEEE, pp 527–534
31.
32.
go back to reference Xue R, Guan Z, Gao S, Ao L (2014) NM2H: Design and implementation of NoSQL extension for HDFS metadata management. In: IEEE 17th International Conference on Computational Science and Engineering (CSE), 2014, IEEE, pp 1282–1289 Xue R, Guan Z, Gao S, Ao L (2014) NM2H: Design and implementation of NoSQL extension for HDFS metadata management. In: IEEE 17th International Conference on Computational Science and Engineering (CSE), 2014, IEEE, pp 1282–1289
33.
go back to reference Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 190–197 Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: IEEE World Congress on Services (SERVICES), 2014, IEEE, pp 190–197
Metadata
Title
A partitioning framework for Cassandra NoSQL database using Rendezvous hashing
Authors
Sally M. Elghamrawy
Aboul Ella Hassanien
Publication date
06-04-2017
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 10/2017
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2027-5

Other articles of this Issue 10/2017

The Journal of Supercomputing 10/2017 Go to the issue

Premium Partner