Elsevier

Future Generation Computer Systems

Volume 110, September 2020, Pages 540-555
Future Generation Computer Systems

SPARTAN: Semantic integration of big spatio-temporal data from streaming and archival sources

https://doi.org/10.1016/j.future.2018.07.007Get rights and content

Highlights

  • A big data framework for semantic integration of streaming mobility data.

  • Online compression of surveillance data, producing accurate trajectory synopses.

  • A flexible transformation tool from diverse streaming and archival sources to RDF.

  • A spatio-temporal link discovery module that operates on streaming data.

Abstract

An ever-increasing number of applications in critical domains, such as maritime and aviation, generate, collect, manage and process spatio-temporal data related to the mobility of entities. This wealth of data can be exploited for various purposes, towards improving the safety of operations, reducing economical costs, and increasing dependability: The major issue to achieve these objectives is increasing predictability of moving objects’ trajectories and events. To achieve this purpose in a data-driven way we need to exploit in integrated manners data from a variety of disparate and heterogeneous data sources, both streaming and archival, regarding – among other – surveillance, weather, and contextual data. Motivated by this fact, in this paper, we propose a framework for semantic integration of big mobility data with other data sources that are necessary to data analytics tasks, providing a unified representation of such data. Notable features of our framework include the real-time generation of data synopses of moving entities’ trajectories, the efficient and flexible transformation of data from heterogeneous and big data sources in RDF, and the spatio-temporal link discovery between spatio-temporal entities in diverse data sources. The design and implementation of our framework uses big data technologies (Apache Flink and Kafka), and our experimental evaluation demonstrates the efficiency and scalability of the proposed framework using large, real-life datasets.

Introduction

The ever-increasing size of spatio-temporal data and the unprecedented rate of data generation from a wide variety of sources regarding the situation awareness and monitoring in critical domains raise the need for scalable, real-time management and analysis of mobility data. Several data analysis tasks rely on moving entities’ trajectories, while trajectory detection and prediction are typically used to optimize everyday, real-life operations. However, using only the kinematic information provided by surveillance sources is far from sufficient, when at the same time a wealth of other sources, including for instance, weather and contextual.1 information is available too. Consequently, one of the major challenges is to enrich surveillance data, providing meaningful information about moving entities’ trajectories, also annotating trajectories with related events, thereby creating enriched trajectories [[1], [2]]. Addressing this challenge calls for real-time processing and semantic integration of surveillance data with other, streaming and archival, data sources [3].

Our work is motivated by the need to advance the management and integrated exploitation of voluminous and heterogeneous data-at-rest (archival data) and data-in-motion (streaming data) sources, so as to significantly promote safety and effectiveness of critical operations for large numbers of moving entities in large geographical areas. Challenges throughout the Big Data ecosystem, with special focus on surveillance systems, concern effective detection and prediction of moving entities’ trajectories and forecasting of complex events associated to these trajectories. These challenges emerge as the number of moving entities and data sources increase at unprecedented scale. This results in generating vast data volumes, of heterogeneous nature, at extremely high rates, whose exploitation calls for novel big data integration techniques that will facilitate advanced data analytics.

In this paper, we propose SPARTAN,2 a big data framework that ingests streaming spatio-temporal data in real-time, extracts useful information, performs data cleaning and summarization, transforms data to RDF in compliance with a generic ontology for trajectories (also connected to domain aspects and domain-related data sources), and performs integration of surveillance data with other streaming and archival data sources. As a result, enriched surveillance data is produced, associating mobility data with other data, thereby offering opportunities for higher level analysis tasks, such as trajectory prediction and complex event recognition and forecasting to achieve higher levels of accuracy. In technical terms, we provide an efficient and scalable implementation of the proposed framework on top of parallel data processing platforms, based on Apache Flink and Kafka.

In more concrete terms, SPARTAN introduces the following innovative features in the integration process for mobility data, considering trajectories to be “first-class entities”: (a) an online trajectory compression technique that produces accurate and compact trajectory synopses in real-time, in contrast to existing works that do not create synopses within milliseconds (or a few seconds at most) since the arrival of raw messages [[4], [5]], (b) an efficient data transformation method from heterogeneous sources to RDF, offering flexibility and consuming data from a wide variety of input sources, and (c) a spatio-temporal link discovery mechanism that integrates trajectory data with other contextual and weather data using spatio-temporal relations; an issue largely overlooked in the state-of-the-art frameworks for link discovery [[6], [7]] (see also [8] for a recent survey). Moreover, all the above innovations are provided as an integrated prototype that consumes streaming (and archival) data and operates in real-time.

In summary, this paper makes the following contributions:

  • We propose a big data framework for the provision of streaming mobility data, transformed in RDF and enriched with other data sources, with low latency requirements. Our framework entails the following specific innovations:

    • We show how to compress surveillance data in an online fashion, by constructing trajectory synopses that are both space-efficient and highly accurate, with low latency.

    • We present an efficient and flexible data transformation tool that accesses heterogeneous streaming and archival data from a variety of diverse data sources and generates RDF graph fragments in compliance with the datAcron ontology [[9], [10]].

    • We propose a generic spatio-temporal link discovery module that operates on streaming data, and efficiently discovers spatio-temporal relations, while supporting blocking techniques and different evaluation functions.

  • We evaluate our approach experimentally using a prototype implementation on top of big data technologies and real-life data, thereby providing evidence about the efficiency of the framework and its potential to provide enriched RDF streams of surveillance data.

The rest of this paper is structured as follows: Section 2 reviews the related work and clarifies how our work advances the state-of-the-art. Section 3 presents the targeted problem setting and motivates our work. Section 4 crisply describes the datAcron ontology for the representation of semantic trajectories. Section 5 describes the overall semantic integration framework, and delves into the details of its components. Section 6 provides technical details on our prototype implementation using big data technologies. Section 7 demonstrates the efficiency of our framework by means of experimental evaluation using real-life datasets. Section 8 provides a discussion on how SPARTAN can be exploited for improving data analysis tasks. Finally, Section 9 concludes the paper and sketches future research directions.

Section snippets

Related work

There are efforts on semantic integration of streaming with archival data designed to operate on RDF, such as [[11], [12]], or efforts towards a framework for the integration of distributed heterogeneous streaming and stored data sources through ontological models, e.g. in [13]. Recently, in [[14], [15]], an approach for integration of streaming with static relational data has been proposed. The Graph of Things [16] targets an IoT setting where many sources provide data for integration and

Motivation & problem setting

Trajectory-based operations, which involve spatio-temporal data of moving entities, have become increasingly important in real-life applications, as they lead to increased safety and minimize cost [[46], [47]]. Key issue to achieve these targets is increasing predictability of trajectories and if events related to the behavior of moving entities. Thus, several analysis tasks revolve around trajectories, including future location and trajectory prediction as well as complex event recognition and

The datAcron ontology

The datAcron ontology,9 was developed to be used as a core ontology for the Maritime Situation Awareness (MSA) and Air Traffic Management (ATM) domains, towards supporting analysis tasks exploiting trajectories at various levels of analysis. Its development has been driven by ontologies related to our objectives (e.g. DUL10 SimpleFeature,11

The SPARTAN framework for semantic integration of spatio-temporal data

SPARTAN is a framework for semantic integration of streaming mobility data with other data sources. It comprises three main components, as illustrated in Fig. 5: (a) Synopses Generator, (b) Data Transformation, and (c) Link Discovery. The components correspond to fundamental steps in the big data analysis pipeline [49], namely (a) data acquisition, cleaning, and filtering, (b) data extraction and representation, and (c) data integration.

In brief, streaming positional data, which is the primary

The SPARTAN big data architecture

In this section, we present the design of the SPARTAN big data architecture, focusing on the implementation of individual components as well as the communication mechanism used for integrating the different components.

Experimental evaluation

In this section, we first describe the main datasets used in the evaluation (Section 7.1). Then, we provide experimental results for the individual components in Section 7.2 in order to study their performance individually, and then present the empirical evaluation of the integrated prototype in Section 7.3.

Discussion

Following SPARTAN’s way, raw trajectory data is transformed into multidimensional sequences (semantic trajectory data) that form a more realistic representation model of the complex every-day life [1]; mobility of vessels belongs to this broad class. Operating on such compressed but semantified time-series may facilitate several analysis tasks. For instance, clustering analysis may benefit from additional variables by incorporating the principle of divide-and-conquer via a semantic-aware

Conclusions and outlook

In this paper, we presented SPARTAN, a framework for real-time semantic integration of big mobility data with other data sources, aiming at providing enriched trajectories that are exploited by higher level analysis tasks. Our framework contains methods for data cleaning and filtering, data transformation, and link discovery, thereby offering an end-to-end solution to the problem of providing enriched streams of mobility data. In our future work, we intend to study in depth how the enriched

Acknowledgments

This work is supported by the datAcron project, Greece , which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687591 (http://datacron-project.eu).

Christos Doulkeridis received the B.Sc. degree in electrical engineering and computer science from the National Technical University of Athens and the M.Sc. and Ph.D. degrees in Information Systems from the Department of Informatics of Athens University of Economics and Business. He is currently an assistant professor in the Department of Digital Systems of the University of Piraeus. His research interests include parallel and distributed data management, and data analytics.

References (54)

  • G. Santipantakis, G. Vouros, A. Glenis, C. Doulkeridis, A. Vlachou, The datAcron ontology for semantic trajectories,...
  • G. Santipantakis, G. Vouros, C. Doulkeridis, A. Vlachou, G. Andrienko, N. Andrienko, G. Fuchs, J.M.C. Garcia, M.G....
  • D.L. Phuoc, M. Dao-Tran, J.X. Parreira, M. Hauswirth, A native and adaptive approach for unified processing of linked...
  • J. Calbimonte, Ó. Corcho, A.J.G. Gray, Enabling ontology-based access to streaming data sources, in: The Semantic Web -...
  • E. Kharlamov, Y. Kotidis, T. Mailis, C. Neuenstadt, C. Nikolaou, Ö.L. Özçep, C. Svingos, D. Zheleznyakov, S. Brandt, I....
  • E. Kharlamov, S. Brandt, E. Jiménez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö.L. Özçep, C. Pinkel, C....
  • PhuocD.L. et al.

    The graph of things: A step towards the live knowledge graph of connected things

    J. Web Sem.

    (2016)
  • BarbieriD.F. et al.

    C-SPARQL: a continuous query language for RDF data streams

    Int. J. Sem. Comput.

    (2010)
  • D. Dell’Aglio, J. Calbimonte, E.D. Valle, Ó. Corcho, Towards a unified language for RDF stream query processing, in:...
  • L.O. Alvares, V. Bogorny, B. Kuijpers, J.A.F. de Macêdo, B. Moelans, A.A. Vaisman, A model for enriching trajectories...
  • BaglioniM. et al.

    Towards semantic interpretation of movement behavior

  • BogornyV. et al.

    Constant - A conceptual data model for semantic trajectories of moving objects

    Trans. GIS

    (2014)
  • T.P. Nogueira, H. Martin, Querying semantic trajectory episodes, in: Proc. of MobiGIS, 2015, pp....
  • DouglasD. et al.

    Algorithms for the reduction of the number of points required to represent a digitized line or its caricature

    Canad. Cartogr.

    (1973)
  • LongC. et al.

    Trajectory simplification: On minimizing the direction-based error

    Proc. VLDB Endow.

    (2014)
  • MuckellJ. et al.

    Compression of trajectory data: a comprehensive evaluation and new approach

    Geoinformatica

    (2014)
  • CaoH. et al.

    Spatio-temporal data reduction with deterministic error bounds

    VLDB J.

    (2006)
  • Christos Doulkeridis received the B.Sc. degree in electrical engineering and computer science from the National Technical University of Athens and the M.Sc. and Ph.D. degrees in Information Systems from the Department of Informatics of Athens University of Economics and Business. He is currently an assistant professor in the Department of Digital Systems of the University of Piraeus. His research interests include parallel and distributed data management, and data analytics.

    View full text