Abstract
This demo presents SpatialHadoop as the first full-fledged MapReduce framework with native support for spatial data. SpatialHadoop is a comprehensive extension to Hadoop that pushes spatial data inside the core functionality of Hadoop. SpatialHadoop runs existing Hadoop programs as is, yet, it achieves order(s) of magnitude better performance than Hadoop when dealing with spatial data. SpatialHadoop employs a simple spatial high level language, a two-level spatial index structure, basic spatial components built inside the MapReduce layer, and three basic spatial operations: range queries, k-NN queries, and spatial join. Other spatial operations can be similarly deployed in SpatialHadoop. We demonstrate a real system prototype of SpatialHadoop running on an Amazon EC2 cluster against two sets of real spatial data obtained from Tiger Files and OpenStreetMap with sizes 60GB and 300GB, respectively.
- Giraph. http://giraph.apache.org/.Google Scholar
- J.-P. Dittrich and B. Seeger. Data Redundancy and Duplicate Detection in Spatial Join Processing. In ICDE, pages 535-546, Mar. 2000. Google Scholar
- A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative Machine Learning on MapReduce. In ICDE, Apr. 2011. Google Scholar
- A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD, June 1984. Google Scholar
- W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient Processing of k Nearest Neighbor Joins using MapReduce. PVLDB, 5:1016-1027, 2012. Google Scholar
- Q. Ma, B. Yang, W. Qian, and A. Zhou. Query Processing of Massive Trajectory Data Based on MapReduce. In CLOUDDB, pages 9-16, Oct. 2009. Google Scholar
- J. Nievergelt, H. Hinterberger, and K. Sevcik. The Grid File: An Adaptable, Symmetric Multikey File Structure. TODS, 9(1):38-71, 1984. Google Scholar
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-so-foreign Language for Data Processing. In SIGMOD, June 2008. Google Scholar
- O. O'Malley. Terabyte Sort on Apache Hadoop. 2008.Google Scholar
- OpenStreetMap. http://www.openstreetmap.org/.Google Scholar
- T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. In VLDB, 1987. Google Scholar
- TIGER files. http://www.census.gov/geo/www/tiger/.Google Scholar
- C. Zhang, F. Li, and J. Jestes. Efficient Parallel kNN Joins for Large Data in MapReduce. In EDBT, Mar. Google Scholar
Index Terms
- A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data
Recommendations
SpatialHadoop: towards flexible and scalable spatial processing using mapreduce
SIGMOD'14 PhD Symposium: Proceedings of the 2014 SIGMOD PhD symposiumRecently, MapReduce frameworks, e.g., Hadoop, have been used extensively in different applications that include tera-byte sorting, machine learning, and graph processing. With the huge volumes of spatial data coming from different sources, there is an ...
The ecosystem of SpatialHadoop
There is a recent outbreak in the amounts of spatial data generated by different sources, e.g., smart phones, space telescopes, and medical devices, which urged researchers to exploit the existing distributed systems to process such amounts of spatial ...
Including the Quadtree index in SpatialHadoop
PCI '20: Proceedings of the 24th Pan-Hellenic Conference on InformaticsSpatialHadoop is a full-fledged MapReduce framework with native support for spatial data. It uses a two-level (global and local) index structure to store the spatial data in a distributed cluster. The global index partitions data across computing nodes, ...
Comments