Skip to main content
Erschienen in: The Journal of Supercomputing 10/2021

22.03.2021

An efficient parallel indexing structure for multi-dimensional big data using spark

verfasst von: Manar A. Elmeiligy, Ali I. El Desouky, Sally M. Elghamrawy

Erschienen in: The Journal of Supercomputing | Ausgabe 10/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the increasing daily production of data in recent years, indexing, storing and retrieving huge amounts of data have become a common problem, especially for multi-dimensional big data. Although R-tree has proved to be efficient for indexing multi-dimensional big data, the R-tree suffers from the curse of dimensionality problem. Many researchers continue to use the R-tree in their studies as it is the most famous tree-like structure for indexing multi-dimensional data. However, with increasing numbers of dimensions in multi-dimensional data the performance of R-Tree will decrease. This paper proposes a new indexing structure called Parallel Indexing System Structure based on Spark (ParISSS), which is an efficient system for indexing multi-dimensional big data, to overcome these problems. ParISSS introduces six types of computing nodes, the reception-node is used to insert and index data, the normal-node is used to store indexed data, the resolution-node is used to distribute a reception-index to a normal-node, the representative-node is used to receive queries from the user, and the reply-node and check-node are used to send the results to the user. We also introduced BR*-tree structure to improve the storing and searching processes. We present an extensive experimental evaluation of our system, comparing several indexing systems. The experimental results show that ParISSS outperforms other indexing systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
9.
Zurück zum Zitat Abdel-Hamid NB, ElGhamrawy S, El Desouky A, Arafat H (2018) A dynamic spark-based classification framework for imbalanced big data. Journal of Grid Computing 16(4):607–626CrossRef Abdel-Hamid NB, ElGhamrawy S, El Desouky A, Arafat H (2018) A dynamic spark-based classification framework for imbalanced big data. Journal of Grid Computing 16(4):607–626CrossRef
11.
Zurück zum Zitat Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. HotCloud. 10(10–10); 95. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. HotCloud. 10(10–10); 95.
22.
Zurück zum Zitat Kim, H. I., Kim, H. J., & Chang, J. W (2016) A kNN query processing algorithm using a tree index structure on the encrypted database. In Big Data and Smart Computing (BigComp), 2016 International Conference on. 93–100. Kim, H. I., Kim, H. J., & Chang, J. W (2016) A kNN query processing algorithm using a tree index structure on the encrypted database. In Big Data and Smart Computing (BigComp), 2016 International Conference on. 93–100.
23.
24.
Zurück zum Zitat Kamel, I., & Faloutsos, C (1993) Hilbert R-tree: An improved R- tree using fractals. Kamel, I., & Faloutsos, C (1993) Hilbert R-tree: An improved R- tree using fractals.
27.
Zurück zum Zitat Memarzia, P., Patrou, M., Alam, M. M., Ray, S., Bhavsar, V. C., & Kent, K. B (2019) Toward efficient processing of spatio-temporal workloads in a distributed in-memory system. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). 118–127. https://doi.org/10.1109/MDM.2019.00-66 Memarzia, P., Patrou, M., Alam, M. M., Ray, S., Bhavsar, V. C., & Kent, K. B (2019) Toward efficient processing of spatio-temporal workloads in a distributed in-memory system. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). 118–127. https://​doi.​org/​10.​1109/​MDM.​2019.​00-66
28.
Zurück zum Zitat V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Cunningham, D. Grove, S. Kodali, I. Peshansky and O. Tardieu (2010) “The asynchronous partitioned global address space model,” in The First Workshop on Advances in Message Passing. 1–8. V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Cunningham, D. Grove, S. Kodali, I. Peshansky and O. Tardieu (2010) “The asynchronous partitioned global address space model,” in The First Workshop on Advances in Message Passing. 1–8.
32.
35.
Zurück zum Zitat Elghamrawy SM, Hassanien AE (2017) A partitioning framework for Cassandra NoSQL database using Rendezvous hashing. The Journal of Supercomputing 73(10):4444–4465CrossRef Elghamrawy SM, Hassanien AE (2017) A partitioning framework for Cassandra NoSQL database using Rendezvous hashing. The Journal of Supercomputing 73(10):4444–4465CrossRef
36.
Zurück zum Zitat Z¨aschke T, Zimmerli C, Norrie MC (2014) The PH-tree: A space-efficient storage structure and multidimensional index. In: The international conference on management of data (SIGMOD’14). 397–408. https://doi.org/https://doi.org/10.1145/2588555.2588564 Z¨aschke T, Zimmerli C, Norrie MC (2014) The PH-tree: A space-efficient storage structure and multidimensional index. In: The international conference on management of data (SIGMOD’14). 397–408. https://​doi.​org/​https://​doi.​org/​10.​1145/​2588555.​2588564
37.
Zurück zum Zitat Beckmann, N., Kriegel, H. P., Schneider, R., & Seeger, B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In Acm Sigmod Record. Acm. 19(2); 322–331. https://doi.org/https://doi.org/10.1145/93597.98741 Beckmann, N., Kriegel, H. P., Schneider, R., & Seeger, B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In Acm Sigmod Record. Acm. 19(2); 322–331. https://​doi.​org/​https://​doi.​org/​10.​1145/​93597.​98741
41.
Zurück zum Zitat Ahmed Eldawy and Mohamed F. Mokbel (2015) "SpatialHadoop: A MapReduce Framework for Spatial Data", In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea. Ahmed Eldawy and Mohamed F. Mokbel (2015) "SpatialHadoop: A MapReduce Framework for Spatial Data", In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea.
Metadaten
Titel
An efficient parallel indexing structure for multi-dimensional big data using spark
verfasst von
Manar A. Elmeiligy
Ali I. El Desouky
Sally M. Elghamrawy
Publikationsdatum
22.03.2021
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 10/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-021-03718-3

Weitere Artikel der Ausgabe 10/2021

The Journal of Supercomputing 10/2021 Zur Ausgabe

Premium Partner