article

A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data

Authors:
Ahmed Eldawy

Department of Computer Science and Engineering, University of Minnesota

Department of Computer Science and Engineering, University of Minnesota
View Profile

,
Mohamed F. Mokbel

Department of Computer Science and Engineering, University of Minnesota

Department of Computer Science and Engineering, University of Minnesota
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 12pp 1230–1233https://doi.org/10.14778/2536274.2536283

Published:01 August 2013Publication History

Proceedings of the VLDB Endowment

Abstract

This demo presents SpatialHadoop as the first full-fledged MapReduce framework with native support for spatial data. SpatialHadoop is a comprehensive extension to Hadoop that pushes spatial data inside the core functionality of Hadoop. SpatialHadoop runs existing Hadoop programs as is, yet, it achieves order(s) of magnitude better performance than Hadoop when dealing with spatial data. SpatialHadoop employs a simple spatial high level language, a two-level spatial index structure, basic spatial components built inside the MapReduce layer, and three basic spatial operations: range queries, k-NN queries, and spatial join. Other spatial operations can be similarly deployed in SpatialHadoop. We demonstrate a real system prototype of SpatialHadoop running on an Amazon EC2 cluster against two sets of real spatial data obtained from Tiger Files and OpenStreetMap with sizes 60GB and 300GB, respectively.

References

Giraph. http://giraph.apache.org/.Google Scholar
J.-P. Dittrich and B. Seeger. Data Redundancy and Duplicate Detection in Spatial Join Processing. In ICDE, pages 535-546, Mar. 2000. Google Scholar
A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative Machine Learning on MapReduce. In ICDE, Apr. 2011. Google Scholar
A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD, June 1984. Google Scholar
W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient Processing of k Nearest Neighbor Joins using MapReduce. PVLDB, 5:1016-1027, 2012. Google Scholar
Q. Ma, B. Yang, W. Qian, and A. Zhou. Query Processing of Massive Trajectory Data Based on MapReduce. In CLOUDDB, pages 9-16, Oct. 2009. Google Scholar
J. Nievergelt, H. Hinterberger, and K. Sevcik. The Grid File: An Adaptable, Symmetric Multikey File Structure. TODS, 9(1):38-71, 1984. Google Scholar
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-so-foreign Language for Data Processing. In SIGMOD, June 2008. Google Scholar
O. O'Malley. Terabyte Sort on Apache Hadoop. 2008.Google Scholar
OpenStreetMap. http://www.openstreetmap.org/.Google Scholar
T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. In VLDB, 1987. Google Scholar
TIGER files. http://www.census.gov/geo/www/tiger/.Google Scholar
C. Zhang, F. Li, and J. Jestes. Efficient Parallel kNN Joins for Large Data in MapReduce. In EDBT, Mar. Google Scholar

Index Terms

A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

SpatialHadoop: towards flexible and scalable spatial processing using mapreduce
SIGMOD'14 PhD Symposium: Proceedings of the 2014 SIGMOD PhD symposium

Recently, MapReduce frameworks, e.g., Hadoop, have been used extensively in different applications that include tera-byte sorting, machine learning, and graph processing. With the huge volumes of spatial data coming from different sources, there is an ...
Read More
The ecosystem of SpatialHadoop

There is a recent outbreak in the amounts of spatial data generated by different sources, e.g., smart phones, space telescopes, and medical devices, which urged researchers to exploit the existing distributed systems to process such amounts of spatial ...
Read More
Including the Quadtree index in SpatialHadoop
PCI '20: Proceedings of the 24th Pan-Hellenic Conference on Informatics

SpatialHadoop is a full-fledged MapReduce framework with native support for spatial data. It uses a two-level (global and local) index structure to store the spatial data in a distributed cluster. The global index partitions data across computing nodes, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 12
August 2013
264 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2013
Published in pvldb Volume 6, Issue 12
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 750
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

SpatialHadoop: towards flexible and scalable spatial processing using mapreduce

The ecosystem of SpatialHadoop

Including the Quadtree index in SpatialHadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

SpatialHadoop: towards flexible and scalable spatial processing using mapreduce

The ecosystem of SpatialHadoop

Including the Quadtree index in SpatialHadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media