Zum Inhalt

Mobility Data Harmonisation: The TANGENT Solution

  • Open Access
  • 2026
  • OriginalPaper
  • Buchkapitel
Erschienen in:

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Das von der Europäischen Kommission finanzierte TANGENT-Projekt entwickelt IKT-Werkzeuge, um den Verkehrsbetrieb koordiniert und dynamisch zu optimieren. Das Projekt konzentriert sich auf die Erhebung und Harmonisierung von Daten aus verschiedenen Quellen, einschließlich Transportplänen, Verkehrsströmen und benutzergenerierten Daten, um Warn- und Empfehlungsdienste zu erbringen. Zu den wichtigsten Herausforderungen von TANGENT gehören Datenentdeckung, -zugriff, -harmonisierung, -integration und -extraktion. Im Rahmen des Projekts wurden mehrere Metadaten- und Dateninteroperabilitätsprobleme identifiziert, die durch eine Reihe von Tools angegangen werden, die darauf ausgelegt sind, diese Herausforderungen zu bewältigen. Der TANGENT-Datenkatalog und die Daten-API erleichtern die Datenentdeckung und den Datenzugriff, während das TANGENT-Referenzkonzeptmodell sowie die semantische Harmonisierung und Fusionspipelines eine Harmonisierung und Integration der Daten ermöglichen. Das Projekt hat erfolgreich einen Prototyp implementiert, der die Harmonisierung von Verkehrsdaten aus verschiedenen Fallstudien-Städten demonstriert und die Vorteile des One-to-One-Mapping-Ansatzes aufzeigt. Die Lösung wurde so konzipiert, dass sie skalierbar und flexibel ist, was sie für die vielfältigen Interoperabilitätsanforderungen des TANGENT-Projekts geeignet macht.

1 Introduction

The TANGENT project1, funded by the European Commission under the Horizon 2020 Programme, is developing new complementary ICT tools for optimising traffic operations in a coordinated and dynamic way from a multimodal perspective and considering automated/non-automated vehicles, passengers, and freight transport.
Data coming from intermodal mobility (transport schedules, transport network, traffic flows, etc.), captured by sensors deployed to monitor the transport network (trains, road traffic, ferries, etc.) or generated by users and vehicles will feed the TANGENT tools to determine the traffic conditions and to deliver warning and recommendation services. One of the project's main objectives is gathering data sources related to the TANGENT case study cities (i.e., Athens, Lisbon, Greater Manchester, and Rennes Metropole) and enabling their usage in the TANGENT tools. The definition of a proper solution is not straightforward and should face the following challenges: (i) discover the data (Which data are available and where?), (ii) access the data: (How to obtain access to the data?), (iii) harmonise the data: (How to convert the data according to a reference data model?), (iv) integrate the data: (How to merge data from different sources?), and (v) extract the data: (How to consume harmonised/integrated data?).
The analysis of the proposed challenges considering the TANGENT case studies has highlighted several metadata and data interoperability issues to be addressed. As shown in Fig. 1, data are available in different data catalogues, i.e., web portals that describe and often host a set of data sources (datasets and data services). Considering different data catalogues, each of them may contain:
  • datasets in different formats and/or using different data models (A, B, C);
  • data services relying on heterogeneous specifications and technologies (I, II);
  • data sources described according to different metadata specifications (d,e).
Identifying a single solution to the abovementioned challenges is impossible since a single interoperability problem cannot be formulated. Indeed, the identified data interoperability issues are widely heterogeneous and pose various requirements that can be possibly faced only by considering a set of tools appropriately configured [4].
Fig. 1.
Metadata and data interoperability issues
Bild vergrößern
The following sections describe the design and the TANGENT solution for data harmonisation and fusion, realised as a set of tools to address the various data interoperability requirements elicited by the project.

2 TANGENT Solution for Data Harmonisation and Fusion

This section describes the components of the TANGENT solution identified to face each of the mentioned challenges: discover, access, harmonise, integrate, and extract.
The European Commission is also addressing the first two challenges (discover and access) by implementing National Access Points (NAP) for transportation data. The concept of NAP leverages the one of Data Catalogue, i.e., a digital platform to facilitate the sharing of data sources and their findability by other stakeholders. However, even in the case of NAPs, each Member State adopted different approaches to implementing their NAP, thus creating interoperability issues at the European level. For this reason, the NAPCORE2 project is currently working on coordinating and harmonising such platforms around Europe. One crucial objective is supporting the findability of data in each mobility data platform and defining a uniform mechanism to access the data sources. Adopting structured metadata descriptors according to well-known vocabularies (e.g., DCAT-AP [2]) is extremely important to facilitate the implementation of searching functionalities within one or multiple data catalogues. Moreover, proper data governance must be defined to regulate the usage of the catalogue between the different involved stakeholders. Finally, data catalogues should support the harmonisation of technological modes to access the data sources and/or provide detailed documentation to assist the data user.
For these reasons, the first two components of the TANGENT solution for data harmonisation and fusion are:
  • The TANGENT Data Catalogue: a digital platform (i) describing uniformly all the relevant data sources collected for the TANGENT case studies according to the TangentDCAT-AP metadata specification, and (ii) implementing governance mechanisms according to the TANGENT Data-Sharing Governance model. A complete description of the metadata specification and the governance model is available in [1].
  • The TANGENT Data API: a uniform API to access data sources and define integrations with the overall TANGENT solution. The basic idea of the TANGENT Data API is to make it easier for the final user to access the data without locating and accessing several data portals. The user can access a data source by knowing the endpoint at which the TANGENT Data API is located and by retrieving the identifier of the data source to be accessed from the TANGENT Data Catalogue. The TANGENT Data API handles authorisation mechanisms for the different data sources. Moreover, in cases where a data source that is published online should be filtered according to the specificities of a case study, the TANGENT Data API is already configured for providing access only to the relevant data (e.g., adding parameters to filter data according to a defined temporal/geographical scope).
The remaining three data interoperability challenges (harmonise, integrate, and extract) require a flexible solution that can be configured to address heterogeneous requirements in terms of: (i) schema and data transformation to obtain syntactic (structural) and semantic interoperability and (ii) integration with existing information systems as data sources (i.e., components generating or storing the data) and/or data sinks (i.e., components consuming or archiving the data).
Different approaches can be exploited and implemented, spanning from ad-hoc solutions targeting a specific scenario to more general and scalable solutions supporting multiple stakeholders and data representations. Into the last approach falls the any-to-one mapping, which reduces the number of mappings, i.e., translations from one representation to another, needed to implement interoperability by different stakeholders [1, 4]. This approach selects a reference model for the domain of interest (e.g., the Transmodel3), and each stakeholder is responsible for defining mappings from their own data representation to the reference model and vice versa.
A specific application of any-to-one mappings relies on using Semantic Web technologies and defining the central model as a reference ontology. The main advantage of this approach is the definition of an interoperable and integrated Knowledge Graph, modelled according to a formal reference ontology, as an additional valuable product of the conversion procedure [4].
The implementation of the semantic any-to-one mapping approach requires the definition of two procedures: (i) a lifting procedure from the source data representation to the reference ontology and (ii) a lowering procedure from the reference ontology to the target data representation.
Different approaches for lifting and lowering can be suitable considering a specific scenario. Moreover, the harmonisation, integration and extraction process may require the definition of custom pre- and post-processing to manage interoperability issues. Therefore, composing and configuring different components should be possible considering the specific requirements for integrating certain data sources.
Semantic-based ETL (Extract, Transform and Load) tools are proposed in the literature to address the problem of defining composable procedures considering Semantic Web technologies. Chimera4 [2] is an open-source solution based on Apache Camel5, enabling the definition of semantic data transformation pipelines by composing different components for knowledge graph construction, transformation, validation, and exploitation. The advantage of Chimera is that the integration with Apache Camel provides a wide set of production-ready components to integrate pipelines with heterogeneous systems. Moreover, custom components can be easily defined and integrated within a pipeline (e.g., to implement custom extraction/filtering approaches). For these reasons, Chimera has been selected to implement the semantic any-to-one mapping approach, considering the integration requirements of the TANGENT project.
Accordingly, two additional elements are identified to support the TANGENT solution for data harmonisation and fusion:
  • The TANGENT Reference Conceptual Model: a model defining a reference ontology for the transportation-related data handled within TANGENT.
  • The Semantic Harmonisation and Fusion Pipelines: a set of workflows defined with Chimera to fulfil the integration requirements associated with the TANGENT data sources.

3 A Prototype of the TANGENT Solution for Data Harmonisation and Fusion

This section discusses a set of illustrative semantic conversion pipelines developed within the project, considering real data sources from case studies and enabling the harmonisation of traffic data according to a uniform output format.
The following data sources from two different case study cities (i.e., Greater Manchester and Rennes Metropole) are considered:
  • MANCH_DATA_01: a data service exposing real-time journey times data for the Greater Manchester case study area in a custom JSON format.
  • MANCH_DATA_02: a data service exposing the position of sensors measuring the journey times in MANCH_DATA_01 in a custom JSON format.
  • RENN_DATA_01: a data service exposing real-time road traffic data for the Rennes Metropole case study area in a custom CSV format.
The three custom formats are different but represent similar data associated with real-time traffic measurements. The three data sources contain travel times data, while only REN_DATA_01 includes additional information regarding average speeds and traffic status.
The harmonisation requirement identified is to represent the information of the three data sources using a uniform JSON format based on DATEX II6 semantics that can be interpreted by other TANGENT services. The resulting data flow should contain the real-time measurements and the position of the sensor that carried out the measurements.
The three data sources are described in the TANGENT Data Catalogue, using a uniform metadata descriptor, and exposed through the TANGENT Data API that filters the incoming data according to the case study areas. The two data sources from Greater Manchester should be merged to obtain the needed data, while the Rennes Metropole data source already integrates both measurements and the position of the sensors. The data in the selected data sources should be lifted to their RDF representation (according to the TANGENT Reference Conceptual Model) by defining a set of declarative mappings. A single lowering mapping is needed to extract data from the produced knowledge graph and lower the information to the common DATEX II-based JSON format. This example shows the advantage of the any-to-one mapping approach: only the lowering mapping should be modified if a modification is introduced in the expected target format since the lifting operation is completely decoupled. Moreover, the intermediate representation through an RDF graph facilitates the possibility of merging different data sources and querying an integrated dataset.
The implementation of the designed pipelines was done following these steps:
  • Analysis of the TANGENT Reference Conceptual Model to identify the relevant class and properties to be used for the mappings of concepts specified in the input data sources.
  • Encoding the lifting mappings from the input JSON format to their RDF representation according to the identified class and properties.
  • Definition of the lowering mappings from the RDF representation to the output JSON format relying on DATEX II-based semantics. The harmonised and possibly fused data are accessed from the knowledge graph and lowering mappings are defined according to the expected output format.
  • The semantic harmonisation and fusion pipelines are defined and configured using the Chimera components and Apache Camel.
The pipelines can then be executed and exposed as a data service listed in the TANGENT Data Catalogue and integrated with the TANGENT Data API.

4 Conclusions

This paper has described the design of the TANGENT solution for data harmonisation and fusion to address heterogeneous interoperability issues related to data discovery, access, harmonisation, integration, and extraction. The solution has been realised as a set of tools (i.e., the TANGENT Data Catalogue, the TANGENT Data API, the TANGENT Reference Conceptual Model, and the Semantic Harmonisation and Fusion Pipelines) to address the various data interoperability requirements elicited by the project. Considering the state-of-the-art and best practices, the solution has been designed by applying the any-to-one centralised approach for semantic interoperability solutions, enabling data conversions and exchange with unambiguous and shared meaning.
The solution has been released, and its integration with the overall TANGENT solution is ongoing.

Acknowledgements

The presented research has been supported by the TANGENT project (Grant Agreement no. 955273), co-funded by the European Commission under the Horizon 2020 Framework Programme.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
download
DOWNLOAD
print
DRUCKEN
Titel
Mobility Data Harmonisation: The TANGENT Solution
Verfasst von
Marco Comerio
Andrea Fiano
Marco Grassi
Mario Scrocca
Copyright-Jahr
2026
DOI
https://doi.org/10.1007/978-3-032-06763-0_50
1.
Zurück zum Zitat Comerio, M., Azzini, A., Scrocca, M., Fiano, A., Metta, S.: D2.2 Data-sharing governance model (2022). https://​tangent-h2020.​eu/​wp-content/​uploads/​2023/​01/​TANGENT_​D2.​2_​Data-Sharing_​Governance_​Model.​pdf
2.
Zurück zum Zitat European Commission. DCAT Application profile for data portals in Europe (DCAT-AP). Version 3.0.0 (2023). https://​semiceu.​github.​io/​/​DCAT-AP/​releases/​3.​0.​0
3.
Zurück zum Zitat Grassi, M., Scrocca, M., Carenini, A., Comerio, M., Celino, I.: Composable semantic data transformation pipelines with chimera. In: Proceedings of the 4th, International Workshop on Knowledge Graph Construction. CEUR Workshop Proceedings (2023)
4.
Zurück zum Zitat Sadeghi, M., et al.: SPRINT: Semantics for PerfoRmant and scalable INteroperability of multimodal transport. In: 8th Transport Research Arena TRA 2020, pp. 1–10 (2020)
5.
Zurück zum Zitat Scrocca, M., Comerio, M., Carenini, A., Celino, I.: Turning transport data to comply with eu standards while enabling a multimodal transport knowledge graph. In: Proceedings of the 19th International Semantic Web Conference, pp. 411–429. Springer, Heidelberg (2020)
6.
Zurück zum Zitat Vetere, G., Lenzerini, M.: Models for semantic interoperability in service-oriented architectures. IBM Syst. J. 44(4), 887–903 (2005)CrossRef
    Bildnachweise
    AVL List GmbH/© AVL List GmbH, dSpace, BorgWarner, Smalley, FEV, Xometry Europe GmbH/© Xometry Europe GmbH, The MathWorks Deutschland GmbH/© The MathWorks Deutschland GmbH, IPG Automotive GmbH/© IPG Automotive GmbH, HORIBA/© HORIBA, Outokumpu/© Outokumpu, Hioko/© Hioko, Head acoustics GmbH/© Head acoustics GmbH, Gentex GmbH/© Gentex GmbH, Ansys, Yokogawa GmbH/© Yokogawa GmbH, Softing Automotive Electronics GmbH/© Softing Automotive Electronics GmbH, measX GmbH & Co. KG