Skip to main content
Top
Published in: GeoInformatica 4/2021

03-07-2019

Parallel and scalable processing of spatio-temporal RDF queries using Spark

Authors: Panagiotis Nikitopoulos, Akrivi Vlachou, Christos Doulkeridis, George A. Vouros

Published in: GeoInformatica | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The ever-increasing size of data emanating from mobile devices and sensors, dictates the use of distributed systems for storing and querying these data. Typically, such data sources provide some spatio-temporal information, alongside other useful data. The RDF data model can be used to interlink and exchange data originating from heterogeneous sources in a uniform manner. For example, consider the case where vessels report their spatio-temporal position, on a regular basis, by using various surveillance systems. In this scenario, a user might be interested to know which vessels were moving in a specific area for a given temporal range. In this paper, we address the problem of efficiently storing and querying spatio-temporal RDF data in parallel. We specifically study the case of SPARQL queries with spatio-temporal constraints, by proposing the DiStRDF system, which is comprised of a Storage and a Processing Layer. The DiStRDF Storage Layer is responsible for efficiently storing large amount of historical spatio-temporal RDF data of moving objects. On top of it, we devise our DiStRDF Processing Layer, which parses a SPARQL query and produces corresponding logical and physical execution plans. We use Spark, a well-known distributed in-memory processing framework, as the underlying processing engine. Our experimental evaluation, on real data from both aviation and maritime domains, demonstrates the efficiency of our DiStRDF system, when using various spatio-temporal range constraints.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Abdelaziz I, Harbi R, Khayyat Z, Kalnis P (2017) A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10 (13):2049–2060 Abdelaziz I, Harbi R, Khayyat Z, Kalnis P (2017) A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10 (13):2049–2060
2.
go back to reference Alarabi L, Mokbel M F, Musleh M (2017) St-hadoop: a mapreduce framework for spatio-temporal data. In: Advances in spatial and temporal databases - 15th international symposium, SSTD 2017, Arlington, VA, USA, August 21-23, 2017, Proceedings, pp 84–104 Alarabi L, Mokbel M F, Musleh M (2017) St-hadoop: a mapreduce framework for spatio-temporal data. In: Advances in spatial and temporal databases - 15th international symposium, SSTD 2017, Arlington, VA, USA, August 21-23, 2017, Proceedings, pp 84–104
3.
go back to reference Bereta K, Smeros P, Koubarakis M (2013) Representation and querying of valid time of triples in linked geospatial data. In: The Semantic web: semantics and big data, 10th international conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings, pp 259–274 Bereta K, Smeros P, Koubarakis M (2013) Representation and querying of valid time of triples in linked geospatial data. In: The Semantic web: semantics and big data, 10th international conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings, pp 259–274
4.
go back to reference Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pp 975–986. https://doi.org/10.1145/1807167.1807273 Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pp 975–986. https://​doi.​org/​10.​1145/​1807167.​1807273
5.
go back to reference Curé O, Blin G (2014) RDF database systems: triples storage and SPARQL query processing. Elsevier Curé O, Blin G (2014) RDF database systems: triples storage and SPARQL query processing. Elsevier
6.
go back to reference Doulkeridis C, Nørvåg K (2014) A survey of large-scale analytical query processing in mapreduce. VLDB J 23(3):355–380CrossRef Doulkeridis C, Nørvåg K (2014) A survey of large-scale analytical query processing in mapreduce. VLDB J 23(3):355–380CrossRef
7.
go back to reference Eldawy A, Mokbel M F (2015) Spatialhadoop: a mapreduce framework for spatial data. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp 1352–1363 Eldawy A, Mokbel M F (2015) Spatialhadoop: a mapreduce framework for spatial data. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp 1352–1363
8.
go back to reference Garbis G, Kyzirakos K, Koubarakis M (2013) Geographica: a benchmark for geospatial rdf stores (long version). In: International semantic web conference, pp 343–359. Springer Garbis G, Kyzirakos K, Koubarakis M (2013) Geographica: a benchmark for geospatial rdf stores (long version). In: International semantic web conference, pp 343–359. Springer
10.
go back to reference Hagedorn S, Rȧth T. (2017) Efficient spatio-temporal event processing with STARK. In: Proceedings of the 20th international conference on extending database technology, EDBT 2017, Venice, Italy, March 21-24, 2017, pp 570–573 Hagedorn S, Rȧth T. (2017) Efficient spatio-temporal event processing with STARK. In: Proceedings of the 20th international conference on extending database technology, EDBT 2017, Venice, Italy, March 21-24, 2017, pp 570–573
11.
go back to reference Husain M F, Doshi P, Khan L, Thuraisingham B M (2009) Storage and retrieval of large rdf graph using hadoop and mapreduce. CloudCom 9:680–686 Husain M F, Doshi P, Khan L, Thuraisingham B M (2009) Storage and retrieval of large rdf graph using hadoop and mapreduce. CloudCom 9:680–686
12.
go back to reference Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. VLDB J 24 (1):67–91CrossRef Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. VLDB J 24 (1):67–91CrossRef
13.
go back to reference Kim H, Ravindra P, Anyanwu K (2011) From SPARQL to mapreduce: the journey using a nested triplegroup algebra. PVLDB 4(12):1426–1429 Kim H, Ravindra P, Anyanwu K (2011) From SPARQL to mapreduce: the journey using a nested triplegroup algebra. PVLDB 4(12):1426–1429
14.
go back to reference Koubarakis M, Karpathiotakis M, Kyzirakos K, Nikolaou C, Sioutis M (2012) Data models and query languages for linked geospatial data. In: Reasoning web. Semantic technologies for advanced query answering - 8th international summer school 2012, Vienna, Austria, September 3-8, 2012. Proceedings, pp. 290–328. https://doi.org/10.1007/978-3-642-33158-9_8 Koubarakis M, Karpathiotakis M, Kyzirakos K, Nikolaou C, Sioutis M (2012) Data models and query languages for linked geospatial data. In: Reasoning web. Semantic technologies for advanced query answering - 8th international summer school 2012, Vienna, Austria, September 3-8, 2012. Proceedings, pp. 290–328. https://​doi.​org/​10.​1007/​978-3-642-33158-9_​8
15.
go back to reference Koubarakis M, Kyzirakos K (2010) Modeling and querying metadata in the semantic sensor web: the model strdf and the query language stsparql. In: The Semantic web: research and applications, 7th extended semantic web conference, ESWC 2010, Heraklion, Crete, Greece, May 30 - June 3, 2010, Proceedings, Part I, pp 425–439 Koubarakis M, Kyzirakos K (2010) Modeling and querying metadata in the semantic sensor web: the model strdf and the query language stsparql. In: The Semantic web: research and applications, 7th extended semantic web conference, ESWC 2010, Heraklion, Crete, Greece, May 30 - June 3, 2010, Proceedings, Part I, pp 425–439
16.
go back to reference Kyzirakos K, Karpathiotakis M, Bereta K, Garbis G, Nikolaou C, Smeros P, Giannakopoulou S, Dogani K, Koubarakis M (2013) The spatiotemporal RDF store Strabon. In: Proceedings of SSTD, pp 496–500 Kyzirakos K, Karpathiotakis M, Bereta K, Garbis G, Nikolaou C, Smeros P, Giannakopoulou S, Dogani K, Koubarakis M (2013) The spatiotemporal RDF store Strabon. In: Proceedings of SSTD, pp 496–500
17.
go back to reference Liagouris J, Mamoulis N, Bouros P, Terrovitis M (2014) An effective encoding scheme for spatial RDF data. PVLDB 7(12):1271–1282 Liagouris J, Mamoulis N, Bouros P, Terrovitis M (2014) An effective encoding scheme for spatial RDF data. PVLDB 7(12):1271–1282
18.
go back to reference Naacke H, Amann B, Curė O (2017) SPARQL graph pattern processing with apache spark. In: Proceedings of the 5th international workshop on graph data-management experiences & systems, GRADES@SIGMOD/PODS 2017, Chicago, IL, USA, May 14 - 19, 2017, pp 1:1–1:7 Naacke H, Amann B, Curė O (2017) SPARQL graph pattern processing with apache spark. In: Proceedings of the 5th international workshop on graph data-management experiences & systems, GRADES@SIGMOD/PODS 2017, Chicago, IL, USA, May 14 - 19, 2017, pp 1:1–1:7
19.
go back to reference Nikitopoulos P, Vlachou A, Doulkeridis C, Vouros GA (2018) Distrdf: distributed spatio-temporal RDF queries on spark. In: Proceedings of the workshops of the EDBT/ICDT 2018 joint conference (EDBT/ICDT 2018), Vienna, Austria, March 26, 2018, pp. 125–132. http://ceur-ws.org/Vol-2083/paper-19.pdf Nikitopoulos P, Vlachou A, Doulkeridis C, Vouros GA (2018) Distrdf: distributed spatio-temporal RDF queries on spark. In: Proceedings of the workshops of the EDBT/ICDT 2018 joint conference (EDBT/ICDT 2018), Vienna, Austria, March 26, 2018, pp. 125–132. http://​ceur-ws.​org/​Vol-2083/​paper-19.​pdf
20.
go back to reference Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing rdf graph pattern matching on mapreduce. In: Extended semantic web conference, pp 46–61. Springer Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing rdf graph pattern matching on mapreduce. In: Extended semantic web conference, pp 46–61. Springer
21.
go back to reference Rohloff K, Schantz R E (2011) Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: DIDC’11, Proceedings of the 4th international workshop on data-intensive distributed computing, San Jose, CA, USA, June 8, 2011, pp 35–44 Rohloff K, Schantz R E (2011) Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: DIDC’11, Proceedings of the 4th international workshop on data-intensive distributed computing, San Jose, CA, USA, June 8, 2011, pp 35–44
22.
go back to reference Santipantakis G M, Glenis A, Patroumpas K, Vlachou A, Doulkeridis C, Vouros G A, Pelekis N, Theodoridis Y (2018) Spartan: semantic integration of big spatio-temporal data from streaming and archival sources. Future Generation Comp Syst Santipantakis G M, Glenis A, Patroumpas K, Vlachou A, Doulkeridis C, Vouros G A, Pelekis N, Theodoridis Y (2018) Spartan: semantic integration of big spatio-temporal data from streaming and archival sources. Future Generation Comp Syst
23.
go back to reference Santipantakis G M, Vouros G A, Doulkeridis C, Vlachou A, Andrienko G L, Andrienko N V, Fuchs G, Garcia J M C, Martinez M G (2017) Specification of semantic trajectories supporting data transformations for analytics: the datacron ontology. In: Proceedings of the 13th international conference on semantic systems, SEMANTICS 2017, Amsterdam, The Netherlands, September 11-14, 2017, pp 17–24 Santipantakis G M, Vouros G A, Doulkeridis C, Vlachou A, Andrienko G L, Andrienko N V, Fuchs G, Garcia J M C, Martinez M G (2017) Specification of semantic trajectories supporting data transformations for analytics: the datacron ontology. In: Proceedings of the 13th international conference on semantic systems, SEMANTICS 2017, Amsterdam, The Netherlands, September 11-14, 2017, pp 17–24
24.
go back to reference Schȧtzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2015) S2X: graph-parallel querying of RDF with graphx. In: Biomedical data management and graph online querying - VLDB 2015 workshops, Big-O(Q) and DMAH, Waikoloa, HI, USA, August 31 - September 4, 2015, Revised Selected Papers, pp 155–168 Schȧtzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2015) S2X: graph-parallel querying of RDF with graphx. In: Biomedical data management and graph online querying - VLDB 2015 workshops, Big-O(Q) and DMAH, Waikoloa, HI, USA, August 31 - September 4, 2015, Revised Selected Papers, pp 155–168
25.
go back to reference Schȧtzle A, Przyjaciel-Zablocki M, Hornung T, Lausen G (2013) Pigsparql: a SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 posters & demonstrations track, Sydney, Australia, October 23, 2013, pp. 241–244 Schȧtzle A, Przyjaciel-Zablocki M, Hornung T, Lausen G (2013) Pigsparql: a SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 posters & demonstrations track, Sydney, Australia, October 23, 2013, pp. 241–244
26.
go back to reference Schȧtzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2016) S2RDF: RDF querying with SPARQL on Spark. PVLDB 9(10):804–815 Schȧtzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2016) S2RDF: RDF querying with SPARQL on Spark. PVLDB 9(10):804–815
27.
go back to reference Shi J, Qiu Y, Minhas U F, Jiao L, Wang C, Reinwald B, Ȯzcan F (2015) Clash of the Titans: MapReduce vs. Spark for large scale data analytics. PVLDB 8(13):2110–2121 Shi J, Qiu Y, Minhas U F, Jiao L, Wang C, Reinwald B, Ȯzcan F (2015) Clash of the Titans: MapReduce vs. Spark for large scale data analytics. PVLDB 8(13):2110–2121
28.
go back to reference Tang M, Yu Y, Malluhi Q M, Ouzzani M, Aref W G (2016) LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9 (13):1565–1568 Tang M, Yu Y, Malluhi Q M, Ouzzani M, Aref W G (2016) LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9 (13):1565–1568
29.
go back to reference Vlachou A, Doulkeridis C, Glenis A, Santipantakis G M, Vouros G A (2019) Efficient spatio-temporal RDF query processing in large dynamic knowledge bases. In: Proceedings of the 34th annual ACM symposium on applied computing, SAC 2019, Limassol, Cyprus, April 08-12, 2019 Vlachou A, Doulkeridis C, Glenis A, Santipantakis G M, Vouros G A (2019) Efficient spatio-temporal RDF query processing in large dynamic knowledge bases. In: Proceedings of the 34th annual ACM symposium on applied computing, SAC 2019, Limassol, Cyprus, April 08-12, 2019
30.
go back to reference Vouros G A, Vlachou A, Santipantakis G M, Doulkeridis C, Pelekis N, Georgiou H V, Theodoridis Y, Patroumpas K, Alevizos E, Artikis A, Claramunt C, Ray C, Scarlatti D, Fuchs G, Andrienko G L, Andrienko N V, Mock M, Camossi E, Jousselme A, Garcia J M C (2018) Big data analytics for time critical mobility forecasting: recent progress and research challenges. In: Proceedings of the 21th international conference on extending database technology, EDBT 2018, Vienna, Austria, March 26-29, 2018., pp 612–623 Vouros G A, Vlachou A, Santipantakis G M, Doulkeridis C, Pelekis N, Georgiou H V, Theodoridis Y, Patroumpas K, Alevizos E, Artikis A, Claramunt C, Ray C, Scarlatti D, Fuchs G, Andrienko G L, Andrienko N V, Mock M, Camossi E, Jousselme A, Garcia J M C (2018) Big data analytics for time critical mobility forecasting: recent progress and research challenges. In: Proceedings of the 21th international conference on extending database technology, EDBT 2018, Vienna, Austria, March 26-29, 2018., pp 612–623
31.
go back to reference Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data, SIGMOD conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pp 1071–1085 Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data, SIGMOD conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pp 1071–1085
33.
go back to reference Yu J, Wu J, Sarwat M (2015) GeoSpark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, pp 70:1–70:4 Yu J, Wu J, Sarwat M (2015) GeoSpark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, pp 70:1–70:4
34.
go back to reference Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M J, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX conference on networked systems design and implementation (NSDI), pp 2–2 Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M J, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX conference on networked systems design and implementation (NSDI), pp 2–2
Metadata
Title
Parallel and scalable processing of spatio-temporal RDF queries using Spark
Authors
Panagiotis Nikitopoulos
Akrivi Vlachou
Christos Doulkeridis
George A. Vouros
Publication date
03-07-2019
Publisher
Springer US
Published in
GeoInformatica / Issue 4/2021
Print ISSN: 1384-6175
Electronic ISSN: 1573-7624
DOI
https://doi.org/10.1007/s10707-019-00371-0

Other articles of this Issue 4/2021

GeoInformatica 4/2021 Go to the issue