SPARTAN: Semantic integration of big spatio-temporal data from streaming and archival sources
Introduction
The ever-increasing size of spatio-temporal data and the unprecedented rate of data generation from a wide variety of sources regarding the situation awareness and monitoring in critical domains raise the need for scalable, real-time management and analysis of mobility data. Several data analysis tasks rely on moving entities’ trajectories, while trajectory detection and prediction are typically used to optimize everyday, real-life operations. However, using only the kinematic information provided by surveillance sources is far from sufficient, when at the same time a wealth of other sources, including for instance, weather and contextual.1 information is available too. Consequently, one of the major challenges is to enrich surveillance data, providing meaningful information about moving entities’ trajectories, also annotating trajectories with related events, thereby creating enriched trajectories [[1], [2]]. Addressing this challenge calls for real-time processing and semantic integration of surveillance data with other, streaming and archival, data sources [3].
Our work is motivated by the need to advance the management and integrated exploitation of voluminous and heterogeneous data-at-rest (archival data) and data-in-motion (streaming data) sources, so as to significantly promote safety and effectiveness of critical operations for large numbers of moving entities in large geographical areas. Challenges throughout the Big Data ecosystem, with special focus on surveillance systems, concern effective detection and prediction of moving entities’ trajectories and forecasting of complex events associated to these trajectories. These challenges emerge as the number of moving entities and data sources increase at unprecedented scale. This results in generating vast data volumes, of heterogeneous nature, at extremely high rates, whose exploitation calls for novel big data integration techniques that will facilitate advanced data analytics.
In this paper, we propose SPARTAN,2 a big data framework that ingests streaming spatio-temporal data in real-time, extracts useful information, performs data cleaning and summarization, transforms data to RDF in compliance with a generic ontology for trajectories (also connected to domain aspects and domain-related data sources), and performs integration of surveillance data with other streaming and archival data sources. As a result, enriched surveillance data is produced, associating mobility data with other data, thereby offering opportunities for higher level analysis tasks, such as trajectory prediction and complex event recognition and forecasting to achieve higher levels of accuracy. In technical terms, we provide an efficient and scalable implementation of the proposed framework on top of parallel data processing platforms, based on Apache Flink and Kafka.
In more concrete terms, SPARTAN introduces the following innovative features in the integration process for mobility data, considering trajectories to be “first-class entities”: (a) an online trajectory compression technique that produces accurate and compact trajectory synopses in real-time, in contrast to existing works that do not create synopses within milliseconds (or a few seconds at most) since the arrival of raw messages [[4], [5]], (b) an efficient data transformation method from heterogeneous sources to RDF, offering flexibility and consuming data from a wide variety of input sources, and (c) a spatio-temporal link discovery mechanism that integrates trajectory data with other contextual and weather data using spatio-temporal relations; an issue largely overlooked in the state-of-the-art frameworks for link discovery [[6], [7]] (see also [8] for a recent survey). Moreover, all the above innovations are provided as an integrated prototype that consumes streaming (and archival) data and operates in real-time.
In summary, this paper makes the following contributions:
We propose a big data framework for the provision of streaming mobility data, transformed in RDF and enriched with other data sources, with low latency requirements. Our framework entails the following specific innovations:
- –
We show how to compress surveillance data in an online fashion, by constructing trajectory synopses that are both space-efficient and highly accurate, with low latency.
- –
We present an efficient and flexible data transformation tool that accesses heterogeneous streaming and archival data from a variety of diverse data sources and generates RDF graph fragments in compliance with the datAcron ontology [[9], [10]].
- –
We propose a generic spatio-temporal link discovery module that operates on streaming data, and efficiently discovers spatio-temporal relations, while supporting blocking techniques and different evaluation functions.
- –
We evaluate our approach experimentally using a prototype implementation on top of big data technologies and real-life data, thereby providing evidence about the efficiency of the framework and its potential to provide enriched RDF streams of surveillance data.
The rest of this paper is structured as follows: Section 2 reviews the related work and clarifies how our work advances the state-of-the-art. Section 3 presents the targeted problem setting and motivates our work. Section 4 crisply describes the datAcron ontology for the representation of semantic trajectories. Section 5 describes the overall semantic integration framework, and delves into the details of its components. Section 6 provides technical details on our prototype implementation using big data technologies. Section 7 demonstrates the efficiency of our framework by means of experimental evaluation using real-life datasets. Section 8 provides a discussion on how SPARTAN can be exploited for improving data analysis tasks. Finally, Section 9 concludes the paper and sketches future research directions.
Section snippets
Related work
There are efforts on semantic integration of streaming with archival data designed to operate on RDF, such as [[11], [12]], or efforts towards a framework for the integration of distributed heterogeneous streaming and stored data sources through ontological models, e.g. in [13]. Recently, in [[14], [15]], an approach for integration of streaming with static relational data has been proposed. The Graph of Things [16] targets an IoT setting where many sources provide data for integration and
Motivation & problem setting
Trajectory-based operations, which involve spatio-temporal data of moving entities, have become increasingly important in real-life applications, as they lead to increased safety and minimize cost [[46], [47]]. Key issue to achieve these targets is increasing predictability of trajectories and if events related to the behavior of moving entities. Thus, several analysis tasks revolve around trajectories, including future location and trajectory prediction as well as complex event recognition and
The datAcron ontology
The datAcron ontology,9 was developed to be used as a core ontology for the Maritime Situation Awareness (MSA) and Air Traffic Management (ATM) domains, towards supporting analysis tasks exploiting trajectories at various levels of analysis. Its development has been driven by ontologies related to our objectives (e.g. DUL10 SimpleFeature,11
The SPARTAN framework for semantic integration of spatio-temporal data
SPARTAN is a framework for semantic integration of streaming mobility data with other data sources. It comprises three main components, as illustrated in Fig. 5: (a) Synopses Generator, (b) Data Transformation, and (c) Link Discovery. The components correspond to fundamental steps in the big data analysis pipeline [49], namely (a) data acquisition, cleaning, and filtering, (b) data extraction and representation, and (c) data integration.
In brief, streaming positional data, which is the primary
The SPARTAN big data architecture
In this section, we present the design of the SPARTAN big data architecture, focusing on the implementation of individual components as well as the communication mechanism used for integrating the different components.
Experimental evaluation
In this section, we first describe the main datasets used in the evaluation (Section 7.1). Then, we provide experimental results for the individual components in Section 7.2 in order to study their performance individually, and then present the empirical evaluation of the integrated prototype in Section 7.3.
Discussion
Following SPARTAN’s way, raw trajectory data is transformed into multidimensional sequences (semantic trajectory data) that form a more realistic representation model of the complex every-day life [1]; mobility of vessels belongs to this broad class. Operating on such compressed but semantified time-series may facilitate several analysis tasks. For instance, clustering analysis may benefit from additional variables by incorporating the principle of divide-and-conquer via a semantic-aware
Conclusions and outlook
In this paper, we presented SPARTAN, a framework for real-time semantic integration of big mobility data with other data sources, aiming at providing enriched trajectories that are exploited by higher level analysis tasks. Our framework contains methods for data cleaning and filtering, data transformation, and link discovery, thereby offering an end-to-end solution to the problem of providing enriched streams of mobility data. In our future work, we intend to study in depth how the enriched
Acknowledgments
This work is supported by the datAcron project, Greece , which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687591 (http://datacron-project.eu).
Christos Doulkeridis received the B.Sc. degree in electrical engineering and computer science from the National Technical University of Athens and the M.Sc. and Ph.D. degrees in Information Systems from the Department of Informatics of Athens University of Economics and Business. He is currently an assistant professor in the Department of Digital Systems of the University of Piraeus. His research interests include parallel and distributed data management, and data analytics.
References (54)
- et al.
Semantic trajectories: mobility data computation and annotation
ACM TIST
(2013) - et al.
Semantic access to streaming and static data at siemens
J. Web Sem.
(2017) - et al.
The baquara knowledge-based framework for semantic enrichment and analysis of movement data
Data Knowl. Eng.
(2015) - et al.
Semantic trajectories modeling and analysis
ACM Comput. Surv.
(2013) - C. Claramunt, C. Ray, E. Camossi, A. Jousselme, M. Hadzagic, G.L. Andrienko, N.V. Andrienko, Y. Theodoridis, G.A....
- K. Patroumpas, A. Artikis, N. Katzouris, M. Vodas, Y. Theodoridis, N. Pelekis, Event recognition for maritime...
- et al.
Online event recognition from moving vessel trajectories
GeoInformatica
(2017) - A.N. Ngomo, S. Auer, LIMES - A time-efficient approach for large-scale link discovery on the web of data, in: IJCAI...
- R. Isele, A. Jentzsch, C. Bizer, Efficient multidimensional blocking for link discovery without losing recall, in:...
- et al.
A survey of current link discovery frameworks
Sem. Web
(2017)
The graph of things: A step towards the live knowledge graph of connected things
J. Web Sem.
C-SPARQL: a continuous query language for RDF data streams
Int. J. Sem. Comput.
Towards semantic interpretation of movement behavior
Constant - A conceptual data model for semantic trajectories of moving objects
Trans. GIS
Algorithms for the reduction of the number of points required to represent a digitized line or its caricature
Canad. Cartogr.
Trajectory simplification: On minimizing the direction-based error
Proc. VLDB Endow.
Compression of trajectory data: a comprehensive evaluation and new approach
Geoinformatica
Spatio-temporal data reduction with deterministic error bounds
VLDB J.
Cited by (13)
A survey on the computation of representative trajectories
2024, GeoInformaticaInconsistency Detection for Spatiotemporal Knowledge Graph with Entity Semantics and Spatiotemporal Features
2023, Journal of Information Science and EngineeringBuilding a Knowledge Graph from Historical Newspapers: A Study Case in Ecuador
2023, Communications in Computer and Information Science
Christos Doulkeridis received the B.Sc. degree in electrical engineering and computer science from the National Technical University of Athens and the M.Sc. and Ph.D. degrees in Information Systems from the Department of Informatics of Athens University of Economics and Business. He is currently an assistant professor in the Department of Digital Systems of the University of Piraeus. His research interests include parallel and distributed data management, and data analytics.