ABSTRACT
Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, etc. do not support streaming workloads and can only handle static/batch workloads. To fill this gap, we present GeoFlink, which extends Apache Flink to support spatial data types, indexes and continuous queries over spatial data streams. To enable efficient processing of spatial continuous queries and for the effective data distribution across Flink cluster nodes, a gird-based index is introduced. GeoFlink currently supports spatial range, spatial kNN and spatial join queries on point data type. An experimental study on real spatial data streams shows that GeoFlink achieves significantly higher query throughput than ordinary Flink processing.
Supplemental Material
- Fakrudeen Ali Ahmed, Jianmei Ye, and Jody Arthur. 2019. Evaluating Streaming Frameworks for Large-Scale Event Streaming. https://medium.com/adobetech/evaluating-streaming-frameworks-for-large-scale-event-streaming-7209938373c8. [Online; accessed 10-March-2020].Google Scholar
- Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. 2013. Hadoop GIS: A High Performance Spatial Data Warehousing System over Mapreduce. Proc. VLDB Endow., Vol. 6, 11 (Aug. 2013), 1009--1020.Google ScholarDigital Library
- ApacheFlinkDoc. 2019. Dataflow Programming Model. https://ci.apache.org/projects/flink/flink-docs-stable/concepts/programming-model.html. [Online; accessed 06-November-2019].Google Scholar
- Furqan Baig, Hoang Vo, Tahsin M. Kurcc, Joel H. Saltz, and Fusheng Wang. 2017. SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing. In Proceedings of the 25th ACM SIGSPATIAL. ACM, 28:1--28:10.Google Scholar
- Jon Louis Bentley and Jerome H. Friedman. 1979. Data Structures for Range Searching. ACM Comput. Surv., Vol. 11, 4 (Dec. 1979), 397--409.Google ScholarDigital Library
- A. Eldawy and M. F. Mokbel. 2015. SpatialHadoop: A MapReduce framework for spatial data. In 2015 IEEE 31st ICDE. 1352--1363.Google Scholar
- ESRI. [n.d.]. ESRI: See patterns, connections, and relationships. https://www.esri.com/. [Online; accessed 12-November-2019].Google Scholar
- The Apache Software Foundation. [n.d.] a. Apache Kafka - A Distributed Streaming Platform. http://spark.apache.org/. [Online; accessed 11-November-2018].Google Scholar
- The Apache Software Foundation. [n.d.] b. Apache Samza - Distributed Stream Processing. http://samza.apache.org/. [Online; accessed 11-November-2018].Google Scholar
- The Apache Software Foundation. [n.d.]. Apache Spark - Lightning-Fast Cluster Computing. http://spark.apache.org/. [Online; accessed 11-November-2018].Google Scholar
- Ralf Hartmut Guting. 1994. An introduction to spatial database systems. VLDB Journal, Vol. 3 (1994), 357 -- 399.Google ScholarDigital Library
- Marios Hadjieleftheriou, Yannis Manolopoulos, Yannis Theodoridis, and Vassilis J. Tsotras. 2017. R-Trees: A Dynamic Index Structure for Spatial Searching .Springer International Publishing, Cham, 1805--1817.Google Scholar
- James N. Hughes, Andrew Annex, and et al. 2015. GeoMesa: a distributed architecture for spatio-temporal fusion. In Geospatial Informatics, Fusion, and Motion Video Analytics V, Vol. 9473.Google Scholar
- J. Karimov, T. Rabl, A. Katsifodimos, R. Samarev, H. Heiskanen, and V. Markl. 2018. Benchmarking Distributed Stream Data Processing Systems. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 1507--1518.Google Scholar
- Jiamin Lu and Ralf Güting. 2012. Parallel SECONDO: Boosting database engines with Hadoop. Proceedings of the ICPADS, 738--743.Google ScholarDigital Library
- Yannis Manolopoulos, Yannis Theodoridis, and Vassilis J. Tsotras. 2009. Spatial Indexing Techniques .Springer US, Boston, MA, 2702--2707.Google Scholar
- National Institute of Advanced Industrial Science and Technology (AIST). [n.d.]. AIST Artificial Intelligence Cloud (AAIC). https://www.airc.aist.go.jp.Google Scholar
- PostGIS. [n.d.]. PostGIS: Spatial and Geographic objects for PostgreSQL. http://postgis.net/. [Online; accessed 10-March-2020].Google Scholar
- QGIS. 2020. QGIS, A Free and Open Source Geographic Information System. https://qgis.org/en/site/. [Online; accessed 31-March-2020].Google Scholar
- Darius Sidlauskas, Simonas Saltenis, Christian W. Christiansen, Jan M. Johansen, and Donatas Saulys. 2009. Trees or grids?: indexing moving objects in main memory. In 17th ACM SIGSPATIAL, Proceedings. 236--245.Google Scholar
- Apache Storm. [n.d.]. Apache Storm: Distributed realtime computation system. https://storm.apache.org/. [Online; accessed 10-March-2020].Google Scholar
- Mingjie Tang, Yongyang Yu, Walid G. Aref, Ahmed R. Mahmood, Qutaibah M. Malluhi, and Mourad Ouzzani. 2019. LocationSpark: In-memory Distributed Spatial Query Processing and Optimization. ArXiv, Vol. abs/1907.03736 (2019).Google Scholar
- Jia Yu, Zongsi Zhang, and Mohamed Sarwat. 2019. Spatial data management in apache spark: the GeoSpark perspective. GeoInformatica, Vol. 23, 1 (2019), 37--78.Google ScholarDigital Library
- Jing Yuan, Yu Zheng, Xing Xie, and Guangzhong Sun. 2011. Driving with Knowledge from the Physical World. In Proceedings of the 17th ACM SIGKDD. Association for Computing Machinery, New York, NY, USA, 316--324.Google ScholarDigital Library
Index Terms
- GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams
Recommendations
GeoSpark: a cluster computing framework for processing large-scale spatial data
SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information SystemsThis paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides ...
WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningThe SDA workshop at WSDM 2015 is the fifth International Workshop on Scalable Data Analytics, following the previous four workshops of SDA respectively held at IEEE Big Data 2013, PAKDD 2014, IEEE Big Data 2014, and IEEE ICDM 2014. This series of ...
E-Learning Real Time Analysis Using Large Scale Infrastructure
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and ApplicationsReal time data analytics is the ability to extract valuable information from live data. It represents a big opportunity to drive smart strategic decisions at the right time. The organizations which adopted this concept, such internet firms, have created ...
Comments