Skip to main content
Erschienen in: Cluster Computing 4/2017

08.08.2017

Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency

verfasst von: Kun Zheng, Danpeng Gu, Falin Fang, Miao Zhang, Kang Zheng, Qi Li

Erschienen in: Cluster Computing | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Scan operation will involve many fragments and cause many extra invalid partitioning query operations in distributed column-oriented database which affects query efficiency seriously, especially for spatial data. To solve this question, this paper refers to partitioning strategy in distributed column-oriented database and advocates a spatial data storage optimization strategy named ‘SPPS’. This strategy makes adjacent spatial objects stored in the same data fragment with considering spatial adjacency, and reserves the spatial information of each fragment. Thus spatial query operation can locate the relevant fragment on basis of spatial information of fragment, and extra invalid partitioning scan operations would be lighted. Then the storage and query efficiency would be improved. To verify the validity of ‘SPPS’ optimization strategy, this paper carries on relevant experiments based on HBase and records spatial query efficiency with and without ‘SPPS’ respectively. The experiments results indicate that ‘SPPS’ strategy can optimize the storage and query efficiency in distributed column-oriented databases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lu, F., Zhang, H.: Big data and generalized GIS. Geomat. Inf. Sci. Wuhan Univ. 39(6), 645–654 (2014) Lu, F., Zhang, H.: Big data and generalized GIS. Geomat. Inf. Sci. Wuhan Univ. 39(6), 645–654 (2014)
2.
Zurück zum Zitat Zhang, X., Song, W., Liu, L.: An implementation approach to store GIS spatial data on NoSQL database. In: Hu, S., Ye, X. (eds.) International Conference on Geoinformatics (2014) Zhang, X., Song, W., Liu, L.: An implementation approach to store GIS spatial data on NoSQL database. In: Hu, S., Ye, X. (eds.) International Conference on Geoinformatics (2014)
3.
Zurück zum Zitat Le, H.V., Takasu, A.: An Efficient Distributed Index for Geospatial Databases, pp. 28–42. Springer, Heidelberg (2015) Le, H.V., Takasu, A.: An Efficient Distributed Index for Geospatial Databases, pp. 28–42. Springer, Heidelberg (2015)
4.
Zurück zum Zitat Alvanaki, F., et al.: GIS navigation boosted by column stores. Proc. Vldb Endow. 8(12), 1956–1959 (2015)CrossRef Alvanaki, F., et al.: GIS navigation boosted by column stores. Proc. Vldb Endow. 8(12), 1956–1959 (2015)CrossRef
5.
Zurück zum Zitat Zhang, N., et al. HBaseSpatial: a scalable spatial data storage based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2014) Zhang, N., et al. HBaseSpatial: a scalable spatial data storage based on HBase. In: IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2014)
6.
Zurück zum Zitat Nishimura, S., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2SI), 289–319 (2013)CrossRef Nishimura, S., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2SI), 289–319 (2013)CrossRef
7.
Zurück zum Zitat Chen, Z., et al.: Hybrid Range Consistent Hash Partitioning Strategy—A New Data Partition Strategy for NoSQL Database, pp. 1161–1169. IEEE, New York (2013) Chen, Z., et al.: Hybrid Range Consistent Hash Partitioning Strategy—A New Data Partition Strategy for NoSQL Database, pp. 1161–1169. IEEE, New York (2013)
8.
Zurück zum Zitat Qi, W., Song, J., Bao, Y.B.: Near-uniform range partition approach for increased partitioning in large database. In: IEEE International Conference on Information Management and Service (IMS) (2010) Qi, W., Song, J., Bao, Y.B.: Near-uniform range partition approach for increased partitioning in large database. In: IEEE International Conference on Information Management and Service (IMS) (2010)
9.
Zurück zum Zitat Kumar, A., Yadav, J.S.: A review on partitioning techniques. Database 35(3), 342–347342 (2014) Kumar, A., Yadav, J.S.: A review on partitioning techniques. Database 35(3), 342–347342 (2014)
10.
Zurück zum Zitat George, L.: HBase schema design—things you need to know—O’Reilly Media Free. Live Events (2017) George, L.: HBase schema design—things you need to know—O’Reilly Media Free. Live Events (2017)
11.
Zurück zum Zitat Chang, F., et al.: Bigtable: a distributed storage system for structured data, pp. 205–218. USENIX Association, Berkeley (2006) Chang, F., et al.: Bigtable: a distributed storage system for structured data, pp. 205–218. USENIX Association, Berkeley (2006)
14.
Zurück zum Zitat Akdogan, A., et al.: Cost-efficient partitioning of spatial data on cloud. In: International Conference on Big Data (2015) Akdogan, A., et al.: Cost-efficient partitioning of spatial data on cloud. In: International Conference on Big Data (2015)
15.
Zurück zum Zitat Xia, C., Wang, T.: Cached Index of HBase based on coprocessor. In: International Conference on Computer Science and Communication Engineering (CSCE 2015), pp. 123–129 (2015) Xia, C., Wang, T.: Cached Index of HBase based on coprocessor. In: International Conference on Computer Science and Communication Engineering (CSCE 2015), pp. 123–129 (2015)
16.
Zurück zum Zitat Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: Proceedings of IEEE International Conference on Computer Science & Software Engineering (2015) Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: Proceedings of IEEE International Conference on Computer Science & Software Engineering (2015)
17.
Zurück zum Zitat Zhuang, H., et al.: Design of a more scalable database system. In: IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, pp. 1213–1216. IEEE, New York (2015) Zhuang, H., et al.: Design of a more scalable database system. In: IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, pp. 1213–1216. IEEE, New York (2015)
18.
Zurück zum Zitat Zhong, Y., Liu, D.: The application of K-means clustering algorithm based on Hadoop. In: Proceedings of 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2016), pp. 88–92 (2016) Zhong, Y., Liu, D.: The application of K-means clustering algorithm based on Hadoop. In: Proceedings of 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2016), pp. 88–92 (2016)
19.
Zurück zum Zitat George, L.: HBase The Definitive Guide. O’Reilly Media, Newton (2011) George, L.: HBase The Definitive Guide. O’Reilly Media, Newton (2011)
20.
Zurück zum Zitat Cruz, F., et al.: Workload-Aware Table Splitting for NoSQL, pp. 399–404. Aurora Construction Materials, Rockbank (2014) Cruz, F., et al.: Workload-Aware Table Splitting for NoSQL, pp. 399–404. Aurora Construction Materials, Rockbank (2014)
21.
Zurück zum Zitat Ye, Z., Li, S.: A request skew aware heterogeneous distributed storage system based on Cassandra. In: International Conference on Computer and Management (2011) Ye, Z., Li, S.: A request skew aware heterogeneous distributed storage system based on Cassandra. In: International Conference on Computer and Management (2011)
22.
Zurück zum Zitat Elghamrawy, S.M.: An adaptive load-balanced partitioning module in Cassandra using rendezvous hashing. In: International Conference on Advanced Intelligent Systems and Information (2016) Elghamrawy, S.M.: An adaptive load-balanced partitioning module in Cassandra using rendezvous hashing. In: International Conference on Advanced Intelligent Systems and Information (2016)
23.
Zurück zum Zitat Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. Vldb Endow. 8(12), 1602–1605 (2015)CrossRef Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. Vldb Endow. 8(12), 1602–1605 (2015)CrossRef
24.
Zurück zum Zitat Han, D., Stroulia, E.: HGrid: a data model for large geospatial data sets in HBase. In: IEEE Sixth International Conference on Cloud Computing (2013) Han, D., Stroulia, E.: HGrid: a data model for large geospatial data sets in HBase. In: IEEE Sixth International Conference on Cloud Computing (2013)
25.
Zurück zum Zitat Fox, A., et al.: Spatio-temporal Indexing in Non-relational Distributed Databases. IEEE, New York (2013)CrossRef Fox, A., et al.: Spatio-temporal Indexing in Non-relational Distributed Databases. IEEE, New York (2013)CrossRef
26.
Zurück zum Zitat Hughes, J.N., et al.: A survey of techniques and open-source tools for processing streams of spatio-temporal events. In: Proceedings of the 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS), pp. 39–42 (2016) Hughes, J.N., et al.: A survey of techniques and open-source tools for processing streams of spatio-temporal events. In: Proceedings of the 7th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS), pp. 39–42 (2016)
28.
Zurück zum Zitat Lee, K., et al.: Efficient spatial query processing for big data. In: ACM Sigspatial International Conference on Advances in Geographic Information Systems (2014) Lee, K., et al.: Efficient spatial query processing for big data. In: ACM Sigspatial International Conference on Advances in Geographic Information Systems (2014)
29.
Zurück zum Zitat Pal, S., et al.: Embedding an Extra Layer of Data Compression Scheme for Efficient Management of Big-Data, pp. 699–708. Springer, New Delhi (2015) Pal, S., et al.: Embedding an Extra Layer of Data Compression Scheme for Efficient Management of Big-Data, pp. 699–708. Springer, New Delhi (2015)
30.
Zurück zum Zitat Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the International Conference on Data Engineering (Series), pp. 497–506. Computer Soc Press, Los Alamitos (1997) Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the International Conference on Data Engineering (Series), pp. 497–506. Computer Soc Press, Los Alamitos (1997)
33.
Zurück zum Zitat Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)CrossRefMathSciNet Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4 (2008)CrossRefMathSciNet
Metadaten
Titel
Data storage optimization strategy in distributed column-oriented database by considering spatial adjacency
verfasst von
Kun Zheng
Danpeng Gu
Falin Fang
Miao Zhang
Kang Zheng
Qi Li
Publikationsdatum
08.08.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1081-3

Weitere Artikel der Ausgabe 4/2017

Cluster Computing 4/2017 Zur Ausgabe