ABSTRACT
Efficient spatial joins are pivotal for many applications and particularly important for geographical information systems or for the simulation sciences where scientists work with spatial models. Past research has primarily focused on disk-based spatial joins; efficient in-memory approaches, however, are important for two reasons: a) main memory has grown so large that many datasets fit in it and b) the in-memory join is a very time-consuming part of all disk-based spatial joins.
In this paper we develop TOUCH, a novel in-memory spatial join algorithm that uses hierarchical data-oriented space partitioning, thereby keeping both its memory footprint and the number of comparisons low. Our results show that TOUCH outperforms known in-memory spatial-join algorithms as well as in-memory implementations of disk-based join approaches. In particular, it has a one order of magnitude advantage over the memory-demanding state of the art in terms of number of comparisons (i.e., pairwise object comparisons), as well as execution time, while it is two orders of magnitude faster when compared to approaches with a similar memory footprint. Furthermore, TOUCH is more scalable than competing approaches as data density grows.
- W. G. Aref and H. Samet. A Cost Model for Query Optimization Using R-Trees. In GIS '94.Google Scholar
- W. G. Aref and H. Samet. Cascaded Spatial Join Algorithms with Spatially Sorted Output. In GIS '96. Google ScholarDigital Library
- W. G. Aref and H. Samet. Hashing by Proximity to Process Duplicates in Spatial Databases. In CIKM '94. Google ScholarDigital Library
- L. Arge, M. de Berg, H. J. Haverkort, and K. Yi. The Priority R-tree: a practically efficient and worst-case optimal R-tree. In SIGMOD '04. Google ScholarDigital Library
- L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable Sweeping-Based Spatial Join. In VLDB '98. Google ScholarDigital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-Tree: an efficient and robust access method for points and rectangles. SIGMOD Record, 19(2):322--331, 1990. Google ScholarDigital Library
- T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient Processing of Spatial Joins Using R-Trees. In SIGMOD '93. Google ScholarDigital Library
- J.-P. Dittrich and B. Seeger. Data Redundancy and Duplicate Detection in Spatial Join Processing. In ICDE 2000.Google Scholar
- R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Addison Wesley, 3rd edition, 2000. Google ScholarDigital Library
- A. Farris, A. Sharma, C. Niedermayr, D. Brat, D. Foran, F. Wang, J. Saltz, J. Kong, L. Cooper, T. Oh, T. Kurc, T. Pan, and W. Chen. A Data Model and Database for High-resolution Pathology Analytical Image Informatics. Journal of Pathology Informatics, 2(1):32, 2011.Google ScholarCross Ref
- Y. J. García, M. A. López, and S. T. Leutenegger. A Greedy Algorithm for Bulk Loading R-trees. In GIS '96.Google Scholar
- S. Gnanakaran, H. Nymeyer, J. Portman, K. Y. Sanbonmatsu, and A. E. Garcia. Peptide folding simulations. Current Opinion in Structural Biology, 13(2):168--174, 2003.Google ScholarCross Ref
- A. Guttman. R-trees: a Dynamic Index Structure for Spatial Searching. In SIGMOD '84. Google ScholarDigital Library
- O. P. Hamill, A. Marty, E. Neher, B. Sakmann, and F. J. Sigworth. Improved Patch-clamp Techniques for High-resolution Current Recording from Cells and Cell-free Membrane Patches. Pflügers Archiv European Journal of Physiology, 391:85--100, 1981.Google ScholarCross Ref
- E. H. Jacox and H. Samet. Spatial Join Techniques. ACM TODS '07. Google ScholarDigital Library
- I. Kamel and C. Faloutsos. Hilbert R-tree: An Improved R-tree using Fractals. In VLDB '94. Google ScholarDigital Library
- N. Koudas and K. C. Sevcik. Size Separation Spatial Join. In SIGMOD '97. Google ScholarDigital Library
- J. Kozloski, K. Sfyrakis, S. Hill, F. Schürmann, C. Peck, and H. Markram. Identifying, Tabulating, and Analyzing Contacts Between Branched Neuron Morphologies. IBM Journal of Research and Development, 52(1/2):43--55, 2008. Google ScholarDigital Library
- S. Leutenegger, M. Lopez, and J. Edgington. STR: a Simple and Efficient Algorithm for R-Tree Packing. In ICDE '97. Google ScholarDigital Library
- M.-L. Lo and C. V. Ravishankar. Spatial Hash-Joins. In SIGMOD '96. Google ScholarDigital Library
- M.-L. Lo and C. V. Ravishankar. Spatial Joins Using Seeded Trees. In SIGMOD '94. Google ScholarDigital Library
- G. Luo, J. F. Naughton, and C. J. Ellmann. A non-blocking parallel spatial join algorithm. In ICDE, pages 697--705, 2002. Google ScholarDigital Library
- N. Mamoulis and D. Papadias. Slot Index Spatial Join. IEEE TKDE, 15(1):211--231, 2003. Google ScholarDigital Library
- P. Mishra and M. H. Eich. Join Processing in Relational Databases. ACM Computing Surveys, 24(1):63--113, 1992. Google ScholarDigital Library
- S. Nobari, T.-T. Cao, P. Karras, and S. Bressan. Scalable parallel minimum spanning forest computation. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 205--214, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- J. Orenstein. A Comparison of Spatial Query Processing Techniques for Native and Parameter Spaces. In SIGMOD '90. Google ScholarDigital Library
- J. M. Patel and D. J. DeWitt. Partition Based Spatial-Merge Join. In SIGMOD '96. Google ScholarDigital Library
- F. Preparata and M. Shamos. Computational Geometry: An Introduction. Springer, 1993. Google ScholarDigital Library
- T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The R++-Tree: A Dynamic Index for Multi-Dimensional Objects. In VLDB '87. Google ScholarDigital Library
- M. Ubell. The Montage Extensible DataBlade Architecture. In SIGMOD '94. Google ScholarDigital Library
Index Terms
- TOUCH: in-memory spatial join by hierarchical data-oriented partitioning
Recommendations
Integration of spatial join algorithms for processing multiple inputs
Several techniques that compute the join between two spatial datasets have been proposed during the last decade. Among these methods, some consider existing indices for the joined inputs, while others treat datasets with no index, providing solutions ...
Integration of spatial join algorithms for processing multiple inputs
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataSeveral techniques that compute the join between two spatial datasets have been proposed during the last decade. Among these methods, some consider existing indices for the joined inputs, while others treat datasets with no index, providing solutions ...
Complex Spatial Query Processing
The user of a Geographical Information System is not limited to conventional spatial selections and joins, but may also pose more complicated and descriptive queries. In this paper, we focus on the efficient processing and optimization of complex ...
Comments