ABSTRACT
Spatial join is a crucial operation in many spatial analysis applications in scientific and geographical information systems. Due to the compute-intensive nature of spatial predicate evaluation, spatial join queries can be slow even with a moderate sized dataset. Efficient parallelization of spatial join is therefore essential to achieve acceptable performance for many spatial applications. Technological trends, including the rising core count and increasingly large main memory, hold great promise in this regard. Previous parallel spatial join approaches tried to partition the dataset so that the number of spatial objects in each partition was as equal as possible. They also focused only on the filter step. However, when the more compute-intensive refinement step is included, significant processing skew may arise due to the uneven size of the objects. This processing skew significantly limits the achievable parallel performance of the spatial join queries, as the longest-running spatial partition determines the overall query execution time.
Our solution is SPINOJA, a skew-resistant parallel in-memory spatial join infrastructure. SPINOJA introduces MOD-Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized. We compare three work metrics used to create the partitions and three load-balancing strategies to assign the partitions to multiple cores. SPINOJA uses an in-memory column-store to store the spatial tables. Our evaluation shows that SPINOJA outperforms in-memory implementations of previous spatial join approaches by a significant margin and a recently proposed in-memory spatial join algorithm by an order of magnitude.
- A. Aboulnaga and J. F. Naughton. Accurate Estimation of the Cost of Spatial Selections. In ICDE, pages 123--134, 2000. Google ScholarDigital Library
- A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop-GIS: A High Performance Spatial Data Warehousing System over Mapreduce. PVLDB, pages 1009--1020, 2013. Google ScholarDigital Library
- J. L. Bentley and T. A. Ottmann. Algorithms for reporting and counting geometric intersections. IEEE Trans. Comput., 28(9):643--647, 1979. Google ScholarDigital Library
- T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient processing of spatial joins using R-trees. In SIGMOD, pages 237--246, 1993. Google ScholarDigital Library
- T. Brinkhoff, H. peter Kriegel, and B. Seeger. Parallel Processing of Spatial Joins Using R-trees. In ICDE, pages 258--265, 1996. Google ScholarDigital Library
- E. Clementini and P. Di Felice. A model for representing topological relationships between complex geometric features in spatial databases. Information Sciences, pages 121--136, 1996. Google ScholarDigital Library
- A. Colantonio and R. Di Pietro. CONCISE: COmpressed 'N' Composable Integer SEt. Information Processing Letters, 110:644--650, 2010. Google ScholarDigital Library
- M. J. Egenhofer and R. Franzosa. Point-set topological spatial relations. Intl Journal of GIS, pages 161--174, 1991.Google Scholar
- R. H. Güting. An introduction to spatial database systems. In VLDB, pages 357--399, 1994.Google Scholar
- E. H. Jacox and H. Samet. Spatial join techniques. ACM Transactions on Database Systems, 32(1), 2007. Google ScholarDigital Library
- N. Koudas and K. C. Sevcik. Size separation spatial join. In SIGMOD, pages 324--335, 1997. Google ScholarDigital Library
- S. Nobari, F. Tauheed, T. Heinis, P. Karras, S. Bressan, and A. Ailamaki. Touch: In-memory spatial join by hierarchical data-oriented partitioning. In SIGMOD, pages 701--712, 2013. Google ScholarDigital Library
- Open Geospatial Consortium. Simple Feature Access - Part 2: SQL Option. http://www.opengeospatial.org/standards/sfs.Google Scholar
- J. A. Orenstein. Spatial query processing in an object-oriented database system. In SIGMOD, pages 326--336, 1986. Google ScholarDigital Library
- J. M. Patel and D. J. DeWitt. Partition based spatial-merge join. In SIGMOD, pages 259--270, 1996. Google ScholarDigital Library
- J. M. Patel and D. J. DeWitt. Clone join and shadow join: two parallel spatial join algorithms. In SIGSPATIAL, pages 54--61, 2000. Google ScholarDigital Library
- F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985. Google ScholarDigital Library
- S. Ray, B. Simion, and A. D. Brown. Jackpine: A benchmark to evaluate spatial database performance. In ICDE, pages 1139--1150, 2011. Google ScholarDigital Library
- S. Ray, B. Simion, A. D. Brown, and R. Johnson. A Parallel Spatial Data Analysis Infrastructure for the Cloud. In SIGSPATIAL, pages 274--283, 2013. Google ScholarDigital Library
- http://www.census.gov/geo/www/tiger.Google Scholar
- S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. SJMR: Parallelizing spatial join with MapReduce on clusters. In CLUSTER, pages 1--8, 2009.Google ScholarCross Ref
- X. Zhou, D. J. Abel, and D. Truffet. Data partitioning for parallel spatial join processing. Geoinformatica, pages 175--204, 1998. Google ScholarDigital Library
Index Terms
- Skew-resistant parallel in-memory spatial join
Recommendations
Multi-way spatial join selectivity for the ring join graph
Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that ...
Iterative spatial join
The key issue in performing spatial joins is finding the pairs of intersecting rectangles. For unindexed data sets, this is usually resolved by partitioning the data and then performing a plane sweep on the individual partitions. The resulting join can ...
A Taxonomy for Distance-Based Spatial Join Queries
Join operation is one of the most used operations in database management systems, including spatial databases. Hence, spatial join queries are very important in spatial database processing. There are many different kinds of spatial join queries, due to ...
Comments