skip to main content
10.1145/2618243.2618262acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Skew-resistant parallel in-memory spatial join

Published:30 June 2014Publication History

ABSTRACT

Spatial join is a crucial operation in many spatial analysis applications in scientific and geographical information systems. Due to the compute-intensive nature of spatial predicate evaluation, spatial join queries can be slow even with a moderate sized dataset. Efficient parallelization of spatial join is therefore essential to achieve acceptable performance for many spatial applications. Technological trends, including the rising core count and increasingly large main memory, hold great promise in this regard. Previous parallel spatial join approaches tried to partition the dataset so that the number of spatial objects in each partition was as equal as possible. They also focused only on the filter step. However, when the more compute-intensive refinement step is included, significant processing skew may arise due to the uneven size of the objects. This processing skew significantly limits the achievable parallel performance of the spatial join queries, as the longest-running spatial partition determines the overall query execution time.

Our solution is SPINOJA, a skew-resistant parallel in-memory spatial join infrastructure. SPINOJA introduces MOD-Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized. We compare three work metrics used to create the partitions and three load-balancing strategies to assign the partitions to multiple cores. SPINOJA uses an in-memory column-store to store the spatial tables. Our evaluation shows that SPINOJA outperforms in-memory implementations of previous spatial join approaches by a significant margin and a recently proposed in-memory spatial join algorithm by an order of magnitude.

References

  1. A. Aboulnaga and J. F. Naughton. Accurate Estimation of the Cost of Spatial Selections. In ICDE, pages 123--134, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop-GIS: A High Performance Spatial Data Warehousing System over Mapreduce. PVLDB, pages 1009--1020, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. L. Bentley and T. A. Ottmann. Algorithms for reporting and counting geometric intersections. IEEE Trans. Comput., 28(9):643--647, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient processing of spatial joins using R-trees. In SIGMOD, pages 237--246, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Brinkhoff, H. peter Kriegel, and B. Seeger. Parallel Processing of Spatial Joins Using R-trees. In ICDE, pages 258--265, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Clementini and P. Di Felice. A model for representing topological relationships between complex geometric features in spatial databases. Information Sciences, pages 121--136, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Colantonio and R. Di Pietro. CONCISE: COmpressed 'N' Composable Integer SEt. Information Processing Letters, 110:644--650, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. J. Egenhofer and R. Franzosa. Point-set topological spatial relations. Intl Journal of GIS, pages 161--174, 1991.Google ScholarGoogle Scholar
  9. R. H. Güting. An introduction to spatial database systems. In VLDB, pages 357--399, 1994.Google ScholarGoogle Scholar
  10. E. H. Jacox and H. Samet. Spatial join techniques. ACM Transactions on Database Systems, 32(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Koudas and K. C. Sevcik. Size separation spatial join. In SIGMOD, pages 324--335, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Nobari, F. Tauheed, T. Heinis, P. Karras, S. Bressan, and A. Ailamaki. Touch: In-memory spatial join by hierarchical data-oriented partitioning. In SIGMOD, pages 701--712, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Open Geospatial Consortium. Simple Feature Access - Part 2: SQL Option. http://www.opengeospatial.org/standards/sfs.Google ScholarGoogle Scholar
  14. J. A. Orenstein. Spatial query processing in an object-oriented database system. In SIGMOD, pages 326--336, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. M. Patel and D. J. DeWitt. Partition based spatial-merge join. In SIGMOD, pages 259--270, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. M. Patel and D. J. DeWitt. Clone join and shadow join: two parallel spatial join algorithms. In SIGSPATIAL, pages 54--61, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Ray, B. Simion, and A. D. Brown. Jackpine: A benchmark to evaluate spatial database performance. In ICDE, pages 1139--1150, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Ray, B. Simion, A. D. Brown, and R. Johnson. A Parallel Spatial Data Analysis Infrastructure for the Cloud. In SIGSPATIAL, pages 274--283, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. http://www.census.gov/geo/www/tiger.Google ScholarGoogle Scholar
  21. S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. SJMR: Parallelizing spatial join with MapReduce on clusters. In CLUSTER, pages 1--8, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  22. X. Zhou, D. J. Abel, and D. Truffet. Data partitioning for parallel spatial join processing. Geoinformatica, pages 175--204, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Skew-resistant parallel in-memory spatial join

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              SSDBM '14: Proceedings of the 26th International Conference on Scientific and Statistical Database Management
              June 2014
              417 pages
              ISBN:9781450327220
              DOI:10.1145/2618243

              Copyright © 2014 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 30 June 2014

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              SSDBM '14 Paper Acceptance Rate26of71submissions,37%Overall Acceptance Rate56of146submissions,38%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader