Skip to main content
Top

2017 | OriginalPaper | Chapter

Sphinx: Empowering Impala for Efficient Execution of SQL Queries on Big Spatial Data

Authors : Ahmed Eldawy, Ibrahim Sabek, Mostafa Elganainy, Ammar Bakeer, Ahmed Abdelmotaleb, Mohamed F. Mokbel

Published in: Advances in Spatial and Temporal Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spatial data types and functions in the SQL interface of Sphinx. The indexer creates spatial indexes in Sphinx by adopting a two-layered index design. The query planner utilizes these indexes to construct efficient query plans for range query and spatial join operations. Finally, the query executor carries out these plans on big spatial datasets in a distributed cluster. A system prototype of Sphinx running on real datasets shows up-to three orders of magnitude performance improvement over plain-vanilla Impala, SpatialHadoop, and PostGIS.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The idea of Sphinx was first introduced as a poster here [13].
 
Literature
2.
go back to reference Auchincloss, A., et al.: A review of spatial methods in epidemiology: 2000–2010. Annu. Rev. Public Health 33, 107–122 (2012)CrossRef Auchincloss, A., et al.: A review of spatial methods in epidemiology: 2000–2010. Annu. Rev. Public Health 33, 107–122 (2012)CrossRef
3.
go back to reference Faghmous, J., Kumar, V.: Spatio-temporal data mining for climate data: advances, challenges, and opportunities. In: Chu, W. (ed.) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol. 1, pp. 83–116. Springer, Heidelberg (2014). doi:10.1007/978-3-642-40837-3_3 Faghmous, J., Kumar, V.: Spatio-temporal data mining for climate data: advances, challenges, and opportunities. In: Chu, W. (ed.) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol. 1, pp. 83–116. Springer, Heidelberg (2014). doi:10.​1007/​978-3-642-40837-3_​3
4.
go back to reference Sankaranarayanan, J., Samet, H., Teitler, B.E., Sperling, M.: TwitterStand: news in tweets. In: SIGSPATIAL (2009) Sankaranarayanan, J., Samet, H., Teitler, B.E., Sperling, M.: TwitterStand: news in tweets. In: SIGSPATIAL (2009)
5.
go back to reference Aji, A., et al.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. In: VLDB (2013) Aji, A., et al.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. In: VLDB (2013)
6.
go back to reference Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE (2015) Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE (2015)
7.
go back to reference Nishimura, S., et al.: \({\cal{MD}}\)-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. DAPD 31(2), 289–319 (2013) Nishimura, S., et al.: \({\cal{MD}}\)-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. DAPD 31(2), 289–319 (2013)
8.
go back to reference Nidzwetzki, J.K., Güting, R.H.: Distributed SECONDO: a highly available and scalable system for spatial data processing. In: Claramunt, C., Schneider, M., Wong, R.C.-W., Xiong, L., Loh, W.-K., Shahabi, C., Li, K.-J. (eds.) SSTD 2015. LNCS, vol. 9239, pp. 491–496. Springer, Cham (2015). doi:10.1007/978-3-319-22363-6_28 CrossRef Nidzwetzki, J.K., Güting, R.H.: Distributed SECONDO: a highly available and scalable system for spatial data processing. In: Claramunt, C., Schneider, M., Wong, R.C.-W., Xiong, L., Loh, W.-K., Shahabi, C., Li, K.-J. (eds.) SSTD 2015. LNCS, vol. 9239, pp. 491–496. Springer, Cham (2015). doi:10.​1007/​978-3-319-22363-6_​28 CrossRef
9.
go back to reference Fox, A., et al.: Spatio-temporal indexing in non-relational distributed databases. In: International Conference on Big Data (2013) Fox, A., et al.: Spatio-temporal indexing in non-relational distributed databases. In: International Conference on Big Data (2013)
10.
go back to reference Yu, J., et al.: A demonstration of GeoSpark: a cluster computing framework for processing big spatial data. In: ICDE (2016) Yu, J., et al.: A demonstration of GeoSpark: a cluster computing framework for processing big spatial data. In: ICDE (2016)
11.
go back to reference Xie, D., et al.: Simba: efficient in-memory spatial analytics. In: SIGMOD, San Francisco, CA, June 2016 Xie, D., et al.: Simba: efficient in-memory spatial analytics. In: SIGMOD, San Francisco, CA, June 2016
12.
go back to reference Whitman, R.T., et al.: Spatial indexing and analytics on hadoop. In: SIGSPATIAL (2014) Whitman, R.T., et al.: Spatial indexing and analytics on hadoop. In: SIGSPATIAL (2014)
13.
go back to reference Eldawy, A., et al.: Sphinx: distributed execution of interactive SQL queries on big spatial data (Poster). In: SIGSPATIAL (2015) Eldawy, A., et al.: Sphinx: distributed execution of interactive SQL queries on big spatial data (Poster). In: SIGSPATIAL (2015)
14.
go back to reference Kornacker, M., et al.: Impala: A Modern. CIDR, Open-Source SQL Engine for Hadoop (2015) Kornacker, M., et al.: Impala: A Modern. CIDR, Open-Source SQL Engine for Hadoop (2015)
15.
go back to reference Wanderman-Milne, S., Li, N.: Runtime code generation in cloudera impala. IEEE Data Eng. Bull. 37(1), 31–37 (2014) Wanderman-Milne, S., Li, N.: Runtime code generation in cloudera impala. IEEE Data Eng. Bull. 37(1), 31–37 (2014)
16.
go back to reference Floratou, A., et al.: SQL-on-hadoop: full circle back to shared-nothing database architectures. In: PVLDB (2014) Floratou, A., et al.: SQL-on-hadoop: full circle back to shared-nothing database architectures. In: PVLDB (2014)
17.
go back to reference Thusoo, A., et al.: Hive: a warehousing solution over a map-reduce framework. In: PVLDB (2009) Thusoo, A., et al.: Hive: a warehousing solution over a map-reduce framework. In: PVLDB (2009)
18.
go back to reference Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: SIGMOD (2015) Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: SIGMOD (2015)
19.
go back to reference Schnitzer, B., Leutenegger, S.T.: Master-client r-trees: a new parallel r-tree architecture. In: SSDBM (1999) Schnitzer, B., Leutenegger, S.T.: Master-client r-trees: a new parallel r-tree architecture. In: SSDBM (1999)
20.
go back to reference DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. In: CACM (1992) DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. In: CACM (1992)
21.
go back to reference Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. In: PVLDB (2015) Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. In: PVLDB (2015)
22.
go back to reference Yu, J., et al.: GeoSpark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL (2015) Yu, J., et al.: GeoSpark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL (2015)
23.
go back to reference Leutenegger, S., et al.: STR: a simple and efficient algorithm for R-tree packing. In: ICDE (1997) Leutenegger, S., et al.: STR: a simple and efficient algorithm for R-tree packing. In: ICDE (1997)
24.
go back to reference den Bercken, J.V., et al.: The bulk index join: a generic approach to processing non-equijoins. In: ICDE (1999) den Bercken, J.V., et al.: The bulk index join: a generic approach to processing non-equijoins. In: ICDE (1999)
25.
go back to reference Patel, J., DeWitt, D.: Partition based spatial-merge join. In: SIGMOD (1996) Patel, J., DeWitt, D.: Partition based spatial-merge join. In: SIGMOD (1996)
26.
go back to reference Dittrich, J.P., Seeger, B.: Data redundancy and duplicate detection in spatial join processing. In: ICDE (2000) Dittrich, J.P., Seeger, B.: Data redundancy and duplicate detection in spatial join processing. In: ICDE (2000)
27.
go back to reference Brinkhoff, T., Kriegel, H., Seeger, B.: Efficient processing of spatial joins using R-trees. In: SIGMOD, pp. 237–246 (1993) Brinkhoff, T., Kriegel, H., Seeger, B.: Efficient processing of spatial joins using R-trees. In: SIGMOD, pp. 237–246 (1993)
28.
go back to reference Arge, L., et al.: Scalable sweeping-based spatial join. In: VLDB (1998) Arge, L., et al.: Scalable sweeping-based spatial join. In: VLDB (1998)
29.
go back to reference Zhang, S., et al.: SJMR: parallelizing spatial join with MapReduce on clusters. In: CLUSTER, pp. 1–8 (2009) Zhang, S., et al.: SJMR: parallelizing spatial join with MapReduce on clusters. In: CLUSTER, pp. 1–8 (2009)
30.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRef
31.
go back to reference Olston, C., et al.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD (2008) Olston, C., et al.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD (2008)
32.
go back to reference Zaharia, M., et al.: Spark: cluster computing with working sets. In: HotCloud (2010) Zaharia, M., et al.: Spark: cluster computing with working sets. In: HotCloud (2010)
33.
34.
go back to reference Zhang, S., et al.: Spatial queries evaluation with MapReduce. In: GCC, pp. 287–292 (2009) Zhang, S., et al.: Spatial queries evaluation with MapReduce. In: GCC, pp. 287–292 (2009)
35.
go back to reference Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: CLOUDDB (2009) Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: CLOUDDB (2009)
36.
go back to reference Akdogan, A., et al.: Voronoi-based geospatial query processing with MapReduce. In: CLOUDCOM (2010) Akdogan, A., et al.: Voronoi-based geospatial query processing with MapReduce. In: CLOUDCOM (2010)
37.
go back to reference You, S., et al.: Large-scale spatial join query processing in cloud. In: CLOUDDM (2015) You, S., et al.: Large-scale spatial join query processing in cloud. In: CLOUDDM (2015)
38.
go back to reference Stonebraker, M., et al.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)CrossRef Stonebraker, M., et al.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)CrossRef
39.
go back to reference Wang, G., et al.: Behavioral Simulations in MapReduce. In: PVLDB (2010) Wang, G., et al.: Behavioral Simulations in MapReduce. In: PVLDB (2010)
40.
go back to reference Lu, J., Guting, R.H.: Parallel secondo: boosting database engines with Hadoop. In: ICPADS (2012) Lu, J., Guting, R.H.: Parallel secondo: boosting database engines with Hadoop. In: ICPADS (2012)
Metadata
Title
Sphinx: Empowering Impala for Efficient Execution of SQL Queries on Big Spatial Data
Authors
Ahmed Eldawy
Ibrahim Sabek
Mostafa Elganainy
Ammar Bakeer
Ahmed Abdelmotaleb
Mohamed F. Mokbel
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-64367-0_4

Premium Partner