Skip to main content
Top

2025 | Book

The Semantic Web – ISWC 2024

23rd International Semantic Web Conference, Baltimore, MD, USA, November 11–15, 2024, Proceedings, Part II

Editors: Gianluca Demartini, Katja Hose, Maribel Acosta, Matteo Palmonari, Gong Cheng, Hala Skaf-Molli, Nicolas Ferranti, Daniel Hernández, Aidan Hogan

Publisher: Springer Nature Switzerland

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This three-volume set constitutes the proceedings of the 23rd International Semantic Web Conference, ISWC 2023, held in Hanover, MD, USA, during November 11-15, 2024.

The 44 full papers presented in these proceedings were carefully reviewed and selected from 155 submissions. This conference focuses on research on the Semantic Web, including benchmarks, knowledge graphs, tools and vocabularies.

Table of Contents

Frontmatter

Research Track

Frontmatter
AdaptLIL: A Real-Time Adaptive Linked Indented List Visualization for Ontology Mapping
Abstract
Visual support designed to facilitate human interaction with ontological data has largely focused on one-size-fits-all solutions, where less attention has been paid to providing personalized visual cues to assist the user in the process of comprehending complex datasets and relationships. To address this research gap, this paper presents an adaptive visualization designed to tailor visual cues to an individual user during ontology mapping activities. The adaptative visualization utilizes physiological signals such as eye gaze to predict one’s success in a given task, and in the event of a predicted failure, real-time visual interventions in the form of highlighting and deemphasis are deployed to direct user attention and assist with task completion. The proposed adaptive visualization is compared to a non-adaptive baseline in a user study involving 76 participants. The experimental results show statistically significant increases in user success with similarly perceived workload and time on task compared to those of the baseline, indicating that the proposed adaptive visualization is effective at improving user performance without tradeoff in workload or task speed. Furthermore, we report on the impact of highlighting and deemphasis on user success and provide recommendations in the development of future adaptive visualizations in human-machine teaming scenarios.
Bo Fu, Nicholas Chow
DISCIE–Discriminative Closed Information Extraction
Abstract
This paper introduces a novel method for closed information extraction. The method employs a discriminative approach that incorporates type and entity-specific information to improve relation extraction accuracy, particularly benefiting long-tail relations. Notably, this method demonstrates superior performance compared to state-of-the-art end-to-end generative models. This is especially evident for the problem of large-scale closed information extraction where we are confronted with millions of entities and hundreds of relations. Furthermore, we emphasize the efficiency aspect by leveraging smaller models. In particular, the integration of type-information proves instrumental in achieving performance levels on par with or surpassing those of a larger generative model. This advancement holds promise for more accurate and efficient information extraction techniques.
Cedric Möller, Ricardo Usbeck
Expanding the Scope: Inductive Knowledge Graph Reasoning with Multi-starting Progressive Propagation
Abstract
Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance and scalability issues. In this paper, we propose a new inductive KG reasoning model, MStar, by leveraging conditional message passing neural networks (C-MPNNs). Our key insight is to select multiple query-specific starting entities to expand the scope of progressive propagation. To propagate query-related messages to a farther area within limited steps, we subsequently design a highway layer to propagate information toward these selected starting entities. Moreover, we introduce a training strategy called LinkVerify to mitigate the impact of noisy training samples. Experimental results validate that MStar achieves superior performance compared with state-of-the-art models, especially for distant entities.
Zhoutian Shao, Yuanning Cui, Wei Hu
Compiling SHACL Into SQL
Abstract
Constraints on graph data expressed in the Shapes Constraint Language (SHACL) can be quite complex. This brings the challenge of efficient validation of complex SHACL constraints on graph data. This challenge is remarkably similar to the processing of analytical queries, investigated intensively in the database community. Motivated by this observation, we have devised an efficient compilation technique from SHACL into SQL, under a natural relational representation of RDF graphs. Our conclusion is that the powerful processing and optimization techniques, already offered by modern SQL engines, are more than up to the challenge.
Maxime Jakubowski, Jan Van den Bussche
DUNKS: Chunking and Summarizing Large and Heterogeneous Data for Dataset Search
Abstract
With the vast influx of open data on the Web, dataset search has become a trending research problem which is crucial to data discovery and reuse. Existing methods for dataset search either employ only the unstructured metadata of datasets but ignore their actual data, or cater to structured data in a single format such as RDF despite the diverse formats of open data. In this paper, to address the magnitude of large datasets, we decompose RDF data into data chunks, and then, to accommodate big chunks to the limited input capacity of dense ranking models based on pre-trained language models, we propose a multi-chunk summarization method that extracts representative data from representative chunks. Moreover, to handle heterogeneous data formats beyond RDF, we transform other formats into chunks to be processed in a uniform way. Experiments on two test collections for dataset search demonstrate the effectiveness of our dense ranking over summarized data chunks.
Qiaosheng Chen, Xiao Zhou, Zhiyang Zhang, Gong Cheng
CRAWD: Sampling-Based Estimation of Count-Distinct SPARQL Queries
Abstract
Count-distinct SPARQL queries compute the number of unique values in the results of a query executed on a Knowledge Graph. However, counting the exact number of distinct values is often computationally demanding and time-consuming. As a result, these queries often fail on public SPARQL endpoints due to fair use policies. In this paper, we propose CRAWD, a new sampling-based approach designed to approximate count-distinct SPARQL queries. CRAWD significantly improves sampling efficiency and allows feasible execution of count-distinct SPARQL queries on public SPARQL endpoints, considerably improving existing methods.
Thi Hoang Thi Pham, Pascal Molli, Brice Nédelec, Hala Skaf-Molli, Julien Aimonier-Davat
Exploiting Distant Supervision to Learn Semantic Descriptions of Tables with Overlapping Data
Abstract
Understanding the semantic structure of tabular data is essential for data integration and discovery. Specifically, the goal is to annotate columns in a tabular source with types and relationships between them using classes and predicates of a target ontology. Previous work that exploits the matches between entities in a knowledge graph and the table data does not perform well for tables with noisy or ambiguous data. A key reason for this poor performance is the limited amount of labeled data to train these methods. To address this problem, we propose a novel distant supervision approach that leverages existing Wikipedia tables and hyperlinks to automatically label tables with their semantic descriptions. Then, we use the labeled dataset to train neural network models to predict the semantic description of a new table. Our empirical evaluation shows that using the automatically labeled dataset provides approximately 5% improvement in column type prediction and 4.5% improvement in column relationship prediction in F1 scores over the state-of-the-art on a large set of real-world tables.
Binh Vu, Craig A. Knoblock, Basel Shbita, Fandel Lin
PathFinder: Returning Paths in Graph Queries
Abstract
Path queries are a central feature of all modern graph query languages and standards, such as SPARQL, Cypher, SQL/PGQ, and GQL. While SPARQL returns endpoints of path queries, it is possible in Cypher, SQL/PGQ, and GQL to return entire paths. In this paper, we present the first framework for returning paths that match regular path queries under all fifteen modes in the SQL/PGQ and GQL standards. At the core of our approach is the product graph construction combined with a way to compactly represent a potentially exponential number of results that can match a path query. Throughout the paper we describe how this approach operates on a conceptual level and provide runtime guarantees for evaluating path queries. We also develop a reference implementation on top of an existing open-source graph processing engine, and perform a detailed analysis of path querying over Wikidata to gauge the usefulness of our methods in a real world scenario. Compared to several modern graph engines, we obtain order-of-magnitude speedups and remarkably stable performance, even for theoretically intractable queries.
Benjamín Farías, Wim Martens, Carlos Rojas, Domagoj Vrgoč
eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs
Abstract
Over the past few years, we have seen the emergence of large knowledge graphs combining information from multiple sources. Sometimes, this information is provided in the form of assertions about other assertions, defining contexts where assertions are valid. A recent extension to RDF which admits statements over statements, called RDF-star, is in revision to become a W3C standard. However, there is no proposal for a semantics of these RDF-star statements nor a built-in facility to operate over them. In this paper, we propose a query language for epistemic RDF-star metadata based on a four-valued logic, called eSPARQL. Our proposed query language extends SPARQL-star, the query language for RDF-star, with a new type of FROM clause to facilitate operating with multiple and sometimes conflicting beliefs. We show that the proposed query language can express four use case queries, including the following features: (i) querying the belief of an individual, (ii) the aggregating of beliefs, (iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs (i.e., nesting of beliefs).
Xinyi Pan, Daniel Hernández, Philipp Seifer, Ralf Lämmel, Steffen Staab
Understanding SPARQL Queries: Are We Already There? Multilingual Natural Language Generation Based on SPARQL Queries and Large Language Models
Abstract
SPARQL is a standard query language for RDF data. Interpreting SPARQL queries might be a challenge, in particular, while being not familiar with the technical specifications of SPARQL or the meaning of the thing identified by a resource. In our study, we take an initial step toward employing Large Language Models to verbalize SPARQL queries, i.e., convert them to natural language. While other research often focused only on English verbalizations, we also implemented the transformation into German and Russian textual representations. The experimental framework leverages a combination of proprietary and open-source models, with enhancements achieved through further fine-tuning these models. Our methodology is assessed using the well-known question answering datasets QALD-9-plus and QALD-10, focusing on the aforementioned three languages: English, German, and Russian. To analyze performance quality, we employ metrics for machine translation alongside a survey for human evaluation. Although we encountered specific error types such as question over-specification, linguistic discrepancies, and semantic mismatches, the findings of our research indicate that Large Language Models are well-suited for the task of translating SPARQL queries into natural language, s.t., the semantics of SPARQL queries is represented in the mother tongue of the users.
Aleksandr Perevalov, Aleksandr Gashkov, Maria Eltsova, Andreas Both
Advancing Robotic Perception with Perceived-Entity Linking
Abstract
The capabilities of current robotic applications are significantly constrained by their limited ability to perceive and understand their surroundings. The Semantic Web aims to offer general, machine-readable knowledge about the world and could be a potential solution to address the information needs of robotic agents. We introduce the Perceived-Entity Linking (PEL) problem as the task of recognizing entities and linking the sensory data of an autonomous agent to a unique identifier in a target knowledge graph. We provide a formal definition of PEL, and propose a PEL baseline based on the YOLO object detection algorithm and a conventional entity linking method as an initial attempt to solve the task. The baseline is evaluated by linking the concepts contained in MS COCO and VisualGenome datasets to WikiData, DBpedia and YAGO as target knowledge graphs. This study makes a first step in allowing robotic agents to leverage the extensive knowledge contained in general-purpose knowledge graphs.
Mark Adamik, Romana Pernisch, Ilaria Tiddi, Stefan Schlobach
PreAdapter: Pre-training Language Models on Knowledge Graphs
Abstract
Pre-trained language models have demonstrated state-of-the-art performance in various downstream tasks such as summarization, sentiment classification, and question answering. Leveraging vast amounts of textual data during training, these models inherently hold a certain amount of factual knowledge, which is particularly beneficial for knowledge-driven tasks such as question answering. However, the knowledge implicitly contained within the language models is not complete. Consequently, many studies incorporate additional knowledge from Semantic Web resources such as knowledge graphs, which provide an explicit representation of knowledge in the form of triples.
Seamless integration of this knowledge into language models remains an active research area. Direct pre-training of language models on knowledge graphs followed by fine-tuning on downstream tasks has proven ineffective, primarily due to the catastrophic forgetting effect. Many approaches suggest fusing language models with graph embedding models to enrich language models with information from knowledge graphs, showing improvement over solutions that lack knowledge graph integration in downstream tasks. However, these methods often require additional computational overhead, for instance, by training graph embedding models.
In our work, we propose a novel adapter-based method for integrating knowledge graphs into language models through pre-training. This approach effectively mitigates catastrophic forgetting that can otherwise affect both the original language modeling capabilities and the access to pre-trained knowledge. Through this scheme, our approach ensures access to both the original capabilities of the language model and the integrated Semantic Web knowledge during fine-tuning on downstream tasks. Experimental results on multiple choice question answering tasks demonstrate performance improvements compared to baseline models without knowledge graph integration and other pre-training-based knowledge integration methods.
Janna Omeliyanenko, Andreas Hotho, Daniel Schlör
PRONTO: Prompt-Based Detection of Semantic Containment Patterns in MLMs
Abstract
Masked Language Models (MLMs) like BERT and RoBERTa excel at predicting missing words based on context, but their ability to understand deeper semantic relationships is still being assessed. While MLMs have demonstrated impressive capabilities, it is still unclear if they merely exploit statistical word co-occurrence or if they can capture a deeper, structured understanding of meaning, similar to how knowledge is organized in ontologies. This is a topic of increasing interest, with researchers seeking to understand how MLMs might internally represent concepts like ontological classes and semantic containment relations (e.g., sub-class and instance-of). Unveiling this knowledge could have significant implications for Semantic Web applications, but it necessitates a profound understanding of how these models express such relationships. This work investigates whether MLMs can understand these relationships, presenting a novel approach to automatically leverage the predictions returned by MLMs to discover semantic containment relations in unstructured text. We achieve this by constructing a verbalizer, a system that translates the model’s internal predictions into classification labels. Through a comprehensive probing procedure, we assess the method’s effectiveness, reliability, and interpretability. Our findings demonstrate a key strength of MLMs: their ability to capture semantic containment relationships. These insights bring significant implications for MLM application in ontology construction and aligning text data with ontologies.
Alessandro De Bellis, Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio
Backmatter
Metadata
Title
The Semantic Web – ISWC 2024
Editors
Gianluca Demartini
Katja Hose
Maribel Acosta
Matteo Palmonari
Gong Cheng
Hala Skaf-Molli
Nicolas Ferranti
Daniel Hernández
Aidan Hogan
Copyright Year
2025
Electronic ISBN
978-3-031-77850-6
Print ISBN
978-3-031-77849-0
DOI
https://doi.org/10.1007/978-3-031-77850-6

Premium Partner