nach oben

2024 | Buch

The Semantic Web

21st International Conference, ESWC 2024, Hersonissos, Crete, Greece, May 26–30, 2024, Proceedings, Part II

herausgegeben von: Albert Meroño Peñuela, Anastasia Dimou, Raphaël Troncy, Olaf Hartig, Maribel Acosta, Mehwish Alam, Heiko Paulheim, Pasquale Lisena

Verlag: Springer Nature Switzerland

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The two-volume set LNCS 14664 and 14665 constitutes the refereed proceedings of the 21st International Conference on The Semantic Web, ESWC 2024, held in Hersonissos, Crete, Greece, during May 26-30, 2024.

The 32 full papers presented were carefully reviewed and selected from 138 submissions. They focus on all aspects of theoretical, analytical, and empirical aspects of the semantic web, semantic technologies, knowledge graphs and semantics on the web in general.

Inhaltsverzeichnis

Frontmatter

Resource

Frontmatter

PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

Abstract

Knowledge graphs (KGs) have emerged as a prominent data representation and management paradigm. Being usually underpinned by a schema (e.g., an ontology), KGs capture not only factual information but also contextual knowledge. In some tasks, a few KGs established themselves as standard benchmarks. However, recent works outline that relying on a limited collection of datasets is not sufficient to assess the generalization capability of an approach. In some data-sensitive fields such as education or medicine, access to public datasets is even more limited. To remedy the aforementioned issues, we release PyGraft, a Python-based tool that generates highly customized, domain-agnostic schemas and KGs. The synthesized schemas encompass various RDFS and OWL constructs, while the synthesized KGs emulate the characteristics and scale of real-world KGs. Logical consistency of the generated resources is ultimately ensured by running a description logic (DL) reasoner. By providing a way of generating both a schema and KG in a single pipeline, PyGraft’s aim is to empower the generation of a more diverse array of KGs for benchmarking novel approaches in areas such as graph-based machine learning (ML), or more generally KG processing. In graph-based ML in particular, this should foster a more holistic evaluation of model performance and generalization capability, thereby going beyond the limited collection of available benchmarks. PyGraft is available at: https://github.com/nicolas-hbt/pygraft.

Nicolas Hubert, Pierre Monnin, Mathieu d’Aquin, Davy Monticolo, Armelle Brun

NORIA-O: An Ontology for Anomaly Detection and Incident Management in ICT Systems

Abstract

Large-scale Information and Communications Technology (ICT) systems give rise to difficult situations such as handling cascading failures and detecting complex malicious activities occurring on multiple services and network layers. For network supervision, managing these situations while ensuring the high-standard quality of service and security requires a comprehensive view on how communication devices are interconnected and are performing. However, the information is spread across heterogeneous data sources which triggers information integration challenges. Existing data models enable to represent computing resources and how they are allocated. However, to date, there is no model to describe the inter-dependencies between the structural, dynamic, and functional aspects of a network infrastructure. In this paper, we propose the NORIA ontology that has been developed together with network and cybersecurity experts in order to describe an infrastructure, its events, diagnosis and repair actions performed during incident management. A use case describing a fictitious failure shows how this ontology can model complex situations and serve as a basis for anomaly detection and root cause analysis. The ontology is available at https://w3id.org/noria and empowers the largest telco operator in France.

Lionel Tailhardat, Yoan Chabot, Raphael Troncy

FlexRML: A Flexible and Memory Efficient Knowledge Graph Materializer

Abstract

We present FlexRML, a flexible and memory efficient software resource for interpreting and executing RML mappings. As a knowledge graph materializer, FlexRML can operate on a wide range of systems, from cloud-based environments to edge devices, as well as resource-constrained IoT devices and real-time microcontrollers. The primary goal of FlexRML is to balance memory efficiency with fast mapping execution. This is achieved by using C++ for the implementation and a result size estimation algorithm that approximates the number of N-Quads generated and, based on the estimate, optimizes bit sizes and data structures used to save memory in preparation for mapping execution. Our evaluation shows that FlexRML’s adaptive bit size and data structure selection results in higher memory efficiency compared to conventional methods. When benchmarked against state-of-the-art RML processors, FlexRML consistently shows lower peak memory consumption across different datasets while delivering faster or comparable execution times.

Resource type: RML Processor

License: GNU AGPLv3

DOI: https://doi.org/10.5281/zenodo.10256148

URL: https://github.com/wintechis/flex-rml

Michael Freund, Sebastian Schmid, Rene Dorsch, Andreas Harth

IICONGRAPH: Improved Iconographic and Iconological Statements in Knowledge Graphs

Abstract

Iconography and iconology are fundamental domains when it comes to understanding artifacts of cultural heritage (CH). Iconography deals with the study and interpretation of visual elements depicted in artifacts and their symbolism, while iconology delves deeper, exploring the underlying cultural and historical meanings. Despite the advances in representing CH with Linked Open Data (LOD), recent studies show persistent gaps in the representation of iconographic and iconological statements in current knowledge graphs (KGs). To address them, this paper presents IICONGRAPH, a KG that was created by refining and extending the iconographic and iconological statements of ArCo (the Italian KG of CH) and Wikidata. The development of IICONGRAPH was also driven by a series of requirements emerging from research case studies expressed in competency questions (CQs) that were unattainable in the non-reengineered versions of the KGs. The evaluation results demonstrate that IICONGRAPH not only outperforms ArCo and Wikidata through domain-specific assessments from the literature but also serves as a robust platform for answering the formulated CQs. IICONGRAPH is released and documented in accordance with the FAIR principles to guarantee the resource’s reusability. The algorithms used to create it and assess the CQs have also been made available to ensure transparency and reproducibility. While future work focuses on ingesting more data into the KG, and on implementing it as a backbone of LLM-based question answering systems, the current version of IICONGRAPH still emerges as a valuable asset, contributing to the evolving landscape of CH representation within KGs, the Semantic Web, and beyond.

Bruno Sartini

VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph

Abstract

The availability of vast amounts of visual data with diverse and fruitful features is a key factor for developing, verifying, and benchmarking advanced computer vision (CV) algorithms and architectures. Most visual datasets are created and curated for specific tasks or with limited data distribution for very specific fields of interest, and there is no unified approach to manage and access them across diverse sources, tasks, and taxonomies. This not only creates unnecessary overheads when building robust visual recognition systems, but also introduces biases into learning systems and limits the capabilities of data-centric AI. To address these problems, we propose the Vision Knowledge Graph (VisionKG), a novel resource that interlinks, organizes and manages visual datasets via knowledge graphs and Semantic Web technologies. It can serve as a unified framework facilitating simple access and querying of state-of-the-art visual datasets, regardless of their heterogeneous formats and taxonomies. One of the key differences between our approach and existing methods is that VisionKG is not only based on metadata but also utilizes a unified data schema and external knowledge bases to integrate, interlink, and align visual datasets. It enhances the enrichment of the semantic descriptions and interpretation at both image and instance levels and offers data retrieval and exploratory services via SPARQL and natural language empowered by Large Language Models (LLMs). VisionKG currently contains 617 million RDF triples that describe approximately 61 million entities, which can be accessed at https://vision.semkg.org and through APIs. With the integration of 37 datasets and four popular computer vision tasks, we demonstrate its usefulness across various scenarios when working with computer vision pipelines.

Jicheng Yuan, Anh Le-Tuan, Manh Nguyen-Duc, Trung-Kien Tran, Manfred Hauswirth, Danh Le-Phuoc

OfficeGraph: A Knowledge Graph of Office Building IoT Measurements

Abstract

In order to support the global energy transition, smart building management provides opportunities to increase efficiency and comfort. In practice, real-world smart buildings make use of combinations of heterogeneous IoT devices, and a need for (knowledge graph-enabled) interoperability solutions has been established. While ontologies and synthetic datasets are available, a real-world, large scale and diverse knowledge graph has so far not been available. In this paper, we present OfficeGraph, a knowledge graph expressed in the saref ontology containing over 14 million sensor measurements from 444 heterogeneous devices, collected over a period of 11 months, in a seven story office building. We describe the procedure of mapping original sensor measurements to rdf and how links to external linked data are established. We describe the resulting knowledge graph consisting of 90 Million rdf triples, and its structural and semantic features. Several use cases are shown of the knowledge graph: a) through various realistic data analysis use cases based on competencies identified by building managers and b) through an existing machine learning experiment where we replace the original dataset with OfficeGraph.

Roderick van der Weerdt, Victor de Boer, Ronald Siebes, Ronnie Groenewold, Frank van Harmelen

Musical Meetups Knowledge Graph (MMKG): A Collection of Evidence for Historical Social Network Analysis

Abstract

Knowledge Graphs (KGs) have emerged as a valuable tool for supporting humanities scholars and cultural heritage organisations. In this resource paper, we present the Musical Meetups Knowledge Graph (MMKG), a collection of evidence of historical collaborations between personalities relevant to the music history domain. We illustrate how we built the KG with a hybrid methodology that, combining knowledge engineering with natural language processing, including the use of Large Language Models (LLM), machine learning, and other techniques, identifies the constituent elements of a historical meetup. MMKG is a network of historical meetups extracted from \(\sim \)33k biographies collected from Wikipedia focused on European musical culture between 1800 and 1945. We discuss how, by providing a structured representation of social interactions, MMKG supports digital humanities applications and music historians’ research, teaching, and learning.

Alba Morales Tirado, Jason Carvalho, Marco Ratta, Chukwudi Uwasomba, Paul Mulholland, Helen Barlow, Trevor Herbert, Enrico Daga

Generate and Update Large HDT RDF Knowledge Graphs on Commodity Hardware

Abstract

HDT is a popular compressed file format to store, share and query large RDF Knowledge Graphs (KGs). While all these operations are possible in low hardware settings (i.e. a standard laptop), the generation and updates of HDT files come with an important hardware cost especially in terms of memory and disk usage.

In this paper, we present a new tool leveraging HDT, namely k-HDTDiffCat, that allows to a) reduce the memory and disk footprint for the creation of HDT files and to b) remove triples from an existing HDT file thus allowing updates.

We show that in a system with 8 times less memory, we can achieve HDT file generation in almost the same time as existing methods. Moreover, our system allows to remove triples from an HDT file catering for updates. This operation is possible without the need to uncompress the original data (as it was the case in the original HDT file) and by keeping low memory consumption.

While HDT was suited for storing, exchanging and querying large Knowledge Graphs in low hardware settings, we also offer the novel functionality to generate and update HDT files in these settings. As a side effect, HDT becomes an ideal indexing structure for large KGs in low hardware settings making them more accessible to the community.

In particular, we show that we can compress the whole Wikidata graph, which is the largest knowledge graph currently available, on a standard laptop with 16 GB of RAM as well as generate Wikidata indexes that are at most 24 h behind the live Wikidata endpoint.

Antoine Willerval, Dennis Diefenbach, Angela Bonifati

SMW Cloud: A Corpus of Domain-Specific Knowledge Graphs from Semantic MediaWikis

Abstract

Semantic wikis have become an increasingly popular means of collaboratively managing Knowledge Graphs. They are powered by platforms such as Semantic MediaWiki and Wikibase, both of which enable MediaWiki to store and publish structured data. While there are many semantic wikis currently in use, there has been little effort to collect and analyse their structured data, nor to make it available for the research community. This paper seeks to address this gap by systematically collecting structured data from an extensive corpus of Semantic-MediaWiki-powered portals and providing an in-depth analysis of the ontological diversity (and re-use) amongst these wikis using a variety of ontological metrics. Our paper aims to demonstrate that semantic wikis are a valuable and extensive part of Linked Open Data (LOD), and in fact may be considered an own active “sub-cloud” within the LOD ecosystem, which can provide useful insights into the evolution of small and medium-sized domain-specific Knowledge Graphs.

Daniil Dobriy, Martin Beno, Axel Polleres

SousLeSens - A Comprehensive Suite for the Industrial Practice of Semantic Knowledge Graphs

Abstract

Over recent decades, the advancement of semantic web technologies has underscored the increasing importance of tools dedicated to developing and managing the foundational components of the semantic web stack. Addressing the evolving needs, a variety of tools have emerged from the research and development projects from academia as well as commercial software vendors. These tools offer a diverse range of services tailored to the management of various aspects of semantic knowledge graphs. Despite this proliferation, feedback from stakeholders involved in public and privately funded projects has highlighted notable shortcomings in existing tools. These gaps become evident in two key areas: firstly, the user experience struggles to scale up to meet industrial-level data practices and knowledge engineering methodologies. Secondly, a lack of interoperability and compatibility among the existing task-specific tools leads to elevated costs and efforts. This paper introduces a novel semantic knowledge management ecosystem embodied in a suite of tools collectively known as ’SousLeSens’. Unlike its counterparts, SLS not only provides comprehensive coverage of typical knowledge engineering tasks while adhering to best practices for ensuring quality but also boasts a purely visual (no to minimum-code) interface. This feature is particularly well-suited for handling large-scale, industry-grade semantic data models. The paper delves into the establishment of requirements for knowledge engineering tools and services, derived from recent stakeholder surveys. It proceeds to present the SLS toolkit, elucidating its architecture and operational protocols. Finally, the paper validates the toolkit’s capabilities by comparing it with existing tools against predefined requirements and illustrating various use cases.

Claude Fauconnet, Jean-Charles Leclerc, Arkopaul Sarkar, Mohamed Hedi Karray

MLSea: A Semantic Layer for Discoverable Machine Learning

Abstract

With the Machine Learning (ML) field rapidly evolving, ML pipelines continuously grow in numbers, complexity and components. Online platforms (e.g., OpenML, Kaggle) aim to gather and disseminate ML experiments. However, available knowledge is fragmented with each platform representing distinct components of the ML process or intersecting components but in different ways. To address this problem, we leverage semantic web technologies to model and integrate ML datasets, experiments, software and scientific works into MLSea, a resource consisting of: (i) MLSO, an ontology that models ML datasets, pipelines and implementations; (ii) MLST, taxonomies with collections of ML knowledge formulated as controlled vocabularies; and (iii) MLSea-KG, an RDF graph containing ML datasets, pipelines, implementations and scientific works from diverse sources. MLSea paves the way for improving the search, explainability and reproducibility of ML pipelines.

Ioannis Dasoulas, Duo Yang, Anastasia Dimou

Enabling Social Demography Research Using Semantic Technologies

Abstract

A shift in scientific publishing from paper-based to knowledge-based practices promotes reproducibility, machine actionability and knowledge discovery. This is important for disciplines like social demography, where study indicators are often social constructs such as race or education, hypothesis tests are challenging to compare due to their limited temporal and spatial coverage, and research output is presented in natural language, which can be ambiguous and imprecise. In this work, we present the MIRA resource, to aid researchers in their research workflow, and publish FAIR findings. MIRA consists of: (1) an ontology for social demography research, (2) a method for automated ontology population by prompting Large Language Models, and (3) a knowledge graph populated in terms of the ontology by annotating a set of research papers on health inequality. The resource allows researchers to formally represent their social demography research hypotheses, discovering research biases and novel research questions.

Lise Stork, Richard L. Zijdeman, Ilaria Tiddi, Annette ten Teije

SCOOP All the Constraints’ Flavours for Your Knowledge Graph

Abstract

Creating SHACL shapes for the validation of RDF graphs is a non-trivial endeavor. Automated shape extraction systems typically derive SHACL shapes from RDF graphs, and thus, their effectiveness is inherently influenced by the size and complexity of the RDF graph. However, these systems often overlook the constraints imposed by individual artifacts, although RDF graphs are often constructed by applying ontology terms to heterogeneous data. Only a few systems extract SHACL shapes from either the data schema or the ontology, leading, in either case, to limited or incomplete constraints. We propose SCOOP, a framework that exploits all artifacts associated with the construction of an RDF graph, i.e. data schemas, ontologies, and mapping rules, and integrates the SHACL shapes extracted from each artifact into a unified shapes graph. We applied our approach to real-world use cases and experimental results showed that SCOOP outperforms systems that extract SHACL shapes from RDF graphs, generating more than double the types of constraints than those systems, and effectively identifying missing and erroneous RDF triples during the validation process.

Resource type: Software Framework — License: Apache-2.0

DOI: https://doi.org/10.5281/zenodo.10280346

URL: https://github.com/dtai-kg/SCOOP

Xuemin Duan, David Chaves-Fraga, Olivier Derom, Anastasia Dimou

FidMark: A Fiducial Marker Ontology for Semantically Describing Visual Markers

Abstract

Fiducial markers are visual objects that can be placed in the field of view of an imaging sensor to determine its position and orientation, and subsequently the scale and position of other objects within the same field of view. They are used in a wide variety of applications ranging from medical applications to augmented reality (AR) solutions where they are applied to determine the location of an AR headset. Despite the wide range of different marker types with their advantages for specific use cases, there exists no standard to decide which marker to best use in which situation. This leads to proprietary AR solutions that rely on a predefined set of marker and pose detection algorithms, preventing interoperability between AR applications. We propose the FidMark fiducial marker ontology, classifying and describing the different markers available for computer vision and augmented reality along with their spatial position and orientation. Our proposed ontology also describes the procedures required to perform pose estimation, and marker detection to allow the description of algorithms used to perform these procedures. With FidMark we aim to enable future AR solutions to semantically describe markers within an environment so that third-party applications can utilise this information.

Maxim Van de Wynckel, Isaac Valadez, Beat Signer

Backmatter

Titel: The Semantic Web
herausgegeben von: Albert Meroño Peñuela
Anastasia Dimou
Raphaël Troncy
Olaf Hartig
Maribel Acosta
Mehwish Alam
Heiko Paulheim
Pasquale Lisena
Verlag: Springer Nature Switzerland
Electronic ISBN: 978-3-031-60635-9
Print ISBN: 978-3-031-60634-2
DOI: https://doi.org/10.1007/978-3-031-60635-9