main-content

## Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 9th Joint International Semantic Technology Conference, JIST 2019, held in Hangzhou, China, in November 2019.

The 24 full papers presented were carefully reviewed and selected from 70 submissions. They present applications of semantic technologies, theoretical results, new algorithms and tools to facilitate the adoption of semantic technologies and are organized in topical sections on knowledge graphs; data management; question answering and NLP; ontology and reasoning; government open data; and semantic web for life sciences.

## Inhaltsverzeichnis

### Incorporating Term Definitions for Taxonomic Relation Identification

Abstract
Taxonomic relations (also called “is-A” relations) are key components in taxonomies, semantic hierarchies and knowledge graphs. Previous works on identifying taxonomic relations are mostly based on linguistic and distributional approaches. However, these approaches are limited by the availability of a large enough corpus that can cover all terms of interest and provide sufficient contextual information to represent their meanings. Therefore, the generalization abilities of the approaches are far from satisfactory. In this paper, we propose a novel neural network model to enhance the semantic representations of term pairs by encoding their respective definitions for the purpose of taxonomic relation identification. This has two main benefits: (i) Definitional sentences represent specified corpus-independent meanings of terms, hence definition-driven approaches have a great generalization capability to identify unseen terms and taxonomic relations which are not expressed in domain specificity of the training data; (ii) Global contextual information from a large corpus and definitions in the sense level can provide richer interpretation of terms from a broader knowledge base perspective, and benefit the accurate prediction for the taxonomic relations of term pairs. The experimental results show that our model outperforms several competitive baseline methods in terms of F-score on both specific and open domain datasets.
Yongpan Sheng, Tianxing Wu, Xin Wang

### Report on the First Knowledge Graph Reasoning Challenge 2018

Toward the eXplainable AI System
Abstract
A new challenge for knowledge graph reasoning started in 2018. Deep learning has promoted the application of artificial intelligence (AI) techniques to a wide variety of social problems. Accordingly, being able to explain the reason for an AI decision is becoming important to ensure the secure and safe use of AI techniques. Thus, we, the Special Interest Group on Semantic Web and Ontology of the Japanese Society for AI, organized a challenge calling for techniques that reason and/or estimate which characters are criminals while providing a reasonable explanation based on an open knowledge graph of a well-known Sherlock Holmes mystery story. This paper presents a summary report of the first challenge held in 2018, including the knowledge graph construction, the techniques proposed for reasoning and/or estimation, the evaluation metrics, and the results. The first prize went to an approach that formalized the problem as a constraint satisfaction problem and solved it using a lightweight formal method; the second prize went to an approach that used SPARQL and rules; the best resource prize went to a submission that constructed word embedding of characters from all sentences of Sherlock Holmes novels; and the best idea prize went to a discussion multi-agents model. We conclude this paper with the plans and issues for the next challenge in 2019.
Takahiro Kawamura, Shusaku Egami, Koutarou Tamura, Yasunori Hokazono, Takanori Ugai, Yusuke Koyanagi, Fumihito Nishino, Seiji Okajima, Katsuhiko Murakami, Kunihiko Takamatsu, Aoi Sugiura, Shun Shiramatsu, Xiangyu Zhang, Kouji Kozaki

### Violence Identification in Social Media

Abstract
A knowledge-based methodology is proposed for the identification of type and level of violence presented implicitly in shared comments on social media. The work was focused on the semantic processing taking into account the content and handling comments as excerpts of knowledge. Our approach implements similarity measures, conceptual distances, graph theory algorithms, knowledge graphs and disambiguation processes.
The methodology is composed for four stages. In the (1) “knowledge base construction” the types and levels of violence are described as well as the knowledge graphs’ administration. Mechanisms of inclusion and extraction were developed for the knowledge base’s handling and content understanding. The (2) “social media data collection” retrieves comments and maps the social graph’s structure. In the (3) “knowledge processing stage” the comments are transformed to formal representations as extracts of knowledge (graphs). Finally in the (4) “violence domain identification” the comments are classified by their type and level of violence. The evaluation was carried out comparing our methodology with the baselines: (1) a dataset with comments labeled by crowdFlower users, (2) news from social network Twitter, (3) a similar research and (4) typical lexical matching.
Julio Vizcarra, Ken Fukuda, Kouji Kozaki

### Event-Oriented Wiki Document Generation

Abstract
We aim to automatically generate event-oriented Wikipedia articles by viewing it as a multi-document summarization problem. In this paper, we propose a new model named WikiGen, which consists of two parts: the first one induces a general topic template from existing Wikipedia articles, and the second one generates a summary for each topic by collecting, filtering, and integrating relevant web news, which will be assembled into the full document. Our evaluation results show that WikiGen is capable of generating fluent and comprehensive Wikipedia documents and outperforms previous work, achieving state-of-the-art ROUGE scores.
Fangwei Zhu, Zhengguo Wang, Juanzi Li, Lei Hou, Jiaxin Shi, Shining Lv, Ran Shen, Junjun Jiang

### A Linked Data Model-View-* Approach for Decoupled Client-Server Applications

Abstract
Separation of concern is found to be a crucial design requirement for maintainable, extendable and understandable software. Research has been done on software design patterns that ensure strict separation of concerns and by this avoid cross cutting concerns in modules of large-scale software projects. In particular, Model-View-* design patterns attempt to decouple local data and business logic from user interfaces, keeping both extendable and exchangeable. Targeting Web-applications, technologies from the domain of Linked Data and Semantic Web have been found suitable to decouple clients from servers. While the potential of both Model-View-Patterns and Linked Data interfaces is often convincingly outlined, there exists to this point little to no work that shows how the findings in said fields can be successfully employed to design large-scale decoupled client-server Web applications. In this paper, we show how lifting a suitable data representation of a Web server application run-time to Linked Data allows to build client-server applications following a decoupled Model-View-Presenter-ViewModel design pattern. This removes the need for fixed server-side APIs, detaches clients from server specifics, and allows clients to implement their business logic entirely on expected semantics of the server data.
Torsten Spieldenner, René Schubotz

### JECI: A Joint Knowledge Graph Embedding Model for Concepts and Instances

Abstract
Concepts and instances are important parts in knowledge graphs, but most knowledge graph embedding models treat them as entities equally, that leads to inaccurate embeddings of concepts and instances. Aiming to address this problem, we propose a novel knowledge graph embedding model called JECI to jointly embed concepts and instances. First, JECI organizes concepts in the knowledge graph as a hierarchical tree, which maps concepts to a tree. Meanwhile, for an instance, JECI generates a context vector to represent the neighbor context in the knowledge graph. Then, based on the context vector and supervision information generated from the hierarchical tree, an embedding learner is designed to precisely locate an instance in embedding space from the coarse-grained to the fine-grained. A prediction function, as the form of convolution, is designed to predict concepts of different granularities that an instance belongs to. In this way, concepts and instances are jointly embedded, and hierarchical structure is preserved in embedds. Especially, JECI can handle the complex relation by incorporating neighbor information of instances. JECI is evaluated by link prediction and triple classification on real world data. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.
Jing Zhou, Peng Wang, Zhe Pan, Zhongkai Xu

### Enhanced Entity Mention Recognition and Disambiguation Technologies for Chinese Knowledge Base Q&A

Abstract
Entity linking, which usually involves mention recognition and entity disambiguation, is an important task in knowledge base question and answer (KBQA). However, due to the diversity of Chinese grammatical structure, the complexity of Chinese natural language expressions and the lack of contextual information, there are still many challenges in the task of the Chinese KBQA. We discussed two subtasks of the entity linking separately. For the mention recognition part, in order to get the only topic entity mention of the question, we proposed a topic entity mention recognition algorithm based on sequence annotation. The algorithm combines a variety of feature vectors based on word embedding, and uses model BiGRU-CRF model to perform sequence labeling modeling. We also proposed an entity disambiguation algorithm based on a similarity calculation with extended information. The algorithm not only realized the information expansion by crawling the candidate entity for related problems, but also made full use of contextual information by combining lexical level similarity and sentence semantic similarity. In addition, the experimental results show that the proposed entity linking solution possesses huge advantages compared to several baseline systems.
Gang Wu, Wenfang Wu, Hangxu Ji, Xianxian Hou, Li Xia

### Dispute Generation in Law Documents via Joint Context and Topic Attention

Abstract
In this paper, we study the Dispute Generation (DG) problem from the plaintiff allegation (PA) and the defendant argument (DA) in a law document. We are the first to formulate DG as a text-to-text natural language generation (NLG) problem. Since the logical relationships between a PA and a DA are rather difficult to identify, existing models cannot generate accurate disputes, let alone find all disputes. To solve this problem, we propose a novel Seq2Seq model with two dispute detection modules, which captures relationships among the PA and the DA in two ways. First, in the context-level detection module, we employ hierarchical attention mechanism to learn sentence representation and joint attention mechanism to match right disputes. Second, in the topic-level detection module, topic information is taken into account to find indirect disputes. We conduct extensive experiments on the real-world dataset. The results demonstrate the effectiveness of our method. Also the results show that the context-level and the topic-level detection modules can improve the accuracy and coverage of generated disputes.
Sheng Bi, Xiya Cheng, Jiamin Chen, Guilin Qi, Meng Wang, Youyong Zhou, Lusheng Wang

### Richpedia: A Comprehensive Multi-modal Knowledge Graph

Abstract
Large-scale knowledge graphs such as Wikidata and DBpedia have become a powerful asset for semantic search and question answering. However, most of the knowledge graph construction works focus on organizing and discovering textual knowledge in a structured representation while paying little attention to the proliferation of visual resources on the Web. To improve the situation, in this paper, we present Richpedia, aim to provide a comprehensive multi-modal knowledge graph by distributing sufficient and diverse images to textual entities in Wikidata. We also set RDF links (visual semantic relations) between image entities based on the hyperlinks and descriptions in Wikipedia. The Richpedia resource is accessible on the Web via a faceted query endpoint and provides a pathway for knowledge graph and computer vision tasks, such as link prediction and visual relation detection.
Meng Wang, Guilin Qi, HaoFen Wang, Qiushuo Zheng

### DSEL: A Domain-Specific Entity Linking System

Abstract
Xinru Zhang, Huifang Xu, Yixin Cao, Yuanpeng Tan, Lei Hou, Juanzi Li, Jiaxin Shi

### Exploring the Generalization of Knowledge Graph Embedding

Abstract
Knowledge graph embedding aims to represent structured entities and relations as continuous and dense low-dimensional vectors. With more and more embedding models being proposed, it has been widely used in many tasks such as semantic search, knowledge graph completion and intelligent question and answer. Most knowledge graph embedding models focus on how to get information about different entities and relations. However, the generalization of knowledge graph embedding or the link prediction ability is not well-studied empirically and theoretically. The study of generalization ability is conducive to further improving the performance of the model. In this paper, we propose two measures to quantify the generalization ability of knowledge graph embedding and use them to analyze the performance of translation-based models. Extensive experimental results show that our measures can well evaluate the generalization ability of a knowledge graph embedding model.
Liang Zhang, Huan Gao, Xianda Zheng, Guilin Qi, Jiming Liu

### Incorporating Instance Correlations in Distantly Supervised Relation Extraction

Abstract
Distantly-supervised relation extraction has proven to be effective to find relational facts from texts. However, the existing approaches treat the instances in the same bag independently and ignore the semantic structural information. In this paper, we propose a graph convolution network (GCN) model with an attention mechanism to improve relation extraction. For each bag, the model first builds a graph through the dependency tree of each instance in this bag. In this way, the correlations between instances are built through their common words. The learned node (word) embeddings which encode the bag information are then fed into the sentence encoder, i.e., text CNN to obtain better representations of sentences. Besides, an instance-level attention mechanism is introduced to select valid instances and learn the textual relation embedding. Finally, the learned embedding is used to train our relation classifier. Experiments on two benchmark datasets demonstrate that our model significantly outperforms the compared baselines.
Luhao Zhang, Linmei Hu, Chuan Shi

### A Physical Embedding Model for Knowledge Graphs

Abstract
Knowledge graph embedding methods learn continuous vector representations for entities in knowledge graphs and have been used successfully in a large number of applications. We present a novel and scalable paradigm for the computation of knowledge graph embeddings, which we dub Pyke. Our approach combines a physical model based on Hooke’s law and its inverse with ideas from simulated annealing to compute embeddings for knowledge graphs efficiently. We prove that Pyke achieves a linear space complexity. While the time complexity for the initialization of our approach is quadratic, the time complexity of each of its iterations is linear in the size of the input knowledge graph. Hence, Pyke’s overall runtime is close to linear. Consequently, our approach easily scales up to knowledge graphs containing millions of triples. We evaluate our approach against six state-of-the-art embedding approaches on the DrugBank and DBpedia datasets in two series of experiments. The first series shows that the cluster purity achieved by Pyke is up to 26% (absolute) better than that of the state of art. In addition, Pyke is more than 22 times faster than existing embedding solutions in the best case. The results of our second series of experiments show that Pyke is up to 23% (absolute) better than the state of art on the task of type prediction while maintaining its superior scalability. Our implementation and results are open-source and are available at http://​github.​com/​dice-group/​PYKE.
Caglar Demir, Axel-Cyrille Ngonga Ngomo

### Iterative Visual Relationship Detection via Commonsense Knowledge Graph

Abstract
Visual relationship detection, i.e., discovering the interaction between pairs of objects in an image, plays a significant role in image understanding. However, most of recent works only consider visual features, ignoring the implicit effect of common sense. Motivated by the iterative visual reasoning in image recognition, we propose a novel model to take the advantage of common sense in the form of the knowledge graph in visual relationship detection, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). Our model consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional RNN; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. After iteratively combining prediction from both modules, IVRDC updates the memory and commonsense knowledge graph. The final predictions are made by taking the result of each iteration into account with an attention mechanism. Our experiments on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.
Hai Wan, Jialing Ou, Baoyi Wang, Jianfeng Du, Jeff Z. Pan, Juan Zeng

### A Dynamic and Informative Intelligent Survey System Based on Knowledge Graph

Abstract
In the paper we propose a dynamic and informative solution to an intelligent survey system that is based on knowledge graph. To illustrate our proposal, we focus on ordering the questions of the questionnaire component by their acceptance, along with conditional triggers that further customise participants’ experience, making the system dynamic. Evaluation of the system shows that the dynamic component can be beneficial in terms of lowering the number of questions asked and improving the quality of data, allowing more informative data to be collected in a survey of equivalent length. Fine-grained analysis allows assessment of the interaction of specific variables, as well as of individual respondents rather than just global results. The paper explores and evaluates two algorithms for the presentation of survey questions, leading to additional insights about how to improve the system .
Patrik Bansky, Elspeth Edelstein, Jeff Z. Pan, Adam Wyner

### CICO: Chemically Induced Carcinogenesis Ontology

Abstract
In vivo experiments have had a great impact on the development of biomedicine, and as a result, a variety of biomedical data is produced and provided to researchers. Standardization and ontology design were carried out for the systematic management and effective sharing of these data. As results of their efforts, useful ontologies such as the Experimental Factor Ontology (EFO), Disease Ontology (DO), Gene Ontology (GO), Chemical Entities of Biological Interest (ChEBI) were developed. However, these ontologies are not enough to provide knowledge about the experiments to researchers conducting in vivo studies. Specifically, in the experimental design process, the generation of cancer causes considerable time and research costs. Researchers conducting animal experiments need animals with signs of carcinogenesis that fits their research interests. Therefore, our study is intended to provide experimental data about inducing cancer in animals. In order to provide this data, we collect experimental data about chemical substances that cause cancer. After that, we design an ontology based on these data and link it with the Disease Ontology. Our research focuses largely on two aspects. The first is to create a knowledge graph that inter-links with other biomedical linked data. The second is to provide practical knowledge to researchers conducting in vivo experiments. In conclusion, our research is provided in the form of a web service, which makes it easy to use the SPARQL endpoint and search service.
Sungmin Yang, Hyunwhan Joe, Sungkwon Yang, Hong-Gee Kim

### Retrofitting Soft Rules for Knowledge Representation Learning

Abstract
Recently, a significant number of studies have focused on knowledge graph completion using rule-enhanced learning techniques, supported by the mined soft rules in addition to the hard logic rules. However, due to the difficulty in determining the confidences of the soft rules without the global semantics of knowledge graph such as the semantic relatedness between relations, the knowledge representation may not be optimal, leading to degraded effectiveness in its application to knowledge graph completion tasks. To address this challenge, this paper proposes a retrofit framework that iteratively enhances the knowledge representation and confidences of soft rules. Specifically, the soft rules guide the learning of knowledge representation, and the representation, in turn, provides global semantic of the knowledge graph to optimize the confidences of soft rules. Extensive evaluation shows that our method achieves new state-of-the-art results on link prediction and triple classification tasks, brought by the fine-tuned confidences of soft rules.
Bo An, Xianpei Han, Le Sun

### Entity Synonym Discovery via Multiple Attentions

Abstract
Entity synonym discovery is an important task, and it can benefit many downstream applications, such as web search, question answering and knowledge graph construction. Two types of approaches are widely exploited to discover synonyms from a raw text corpus, including the distributional based approaches and pattern based approaches. However, they suffered from either low precision or low recall. In this paper, we propose a novel framework SynMine to extract synonyms from massive raw text corpora. The framework can integrate corpus-level statistics and local contexts in a unified way via a multi-attention mechanism. Extensive experiments on a real-world dataset show the effectiveness of our approach.
Jiale Yu, Weiming Lu, Wei Xu, Zeyun Tang

### Towards Association Rule-Based Complex Ontology Alignment

Abstract
Ontology alignment has been studied for over a decade, and over that time many alignment systems have been developed by researchers in order to find simple 1-to-1 equivalence alignments between ontologies. However, finding complex alignments, i.e., alignments that are not simple class or property equivalences, is a topic largely unexplored but with growing significance. Currently, establishing a complex alignment requires domain experts to work together to manually generate the alignment, which is extremely time-consuming and labor-intensive. In this paper, we propose an automated method based on association rule mining to detect not only simple alignments, but also more complex alignments between ontologies. Our algorithm can also be used in a semi-automated fashion to effectively assist users in finding potential complex alignments which they can then validate or edit. In addition, we evaluate the performance of our algorithm on the complex alignment benchmark of the Ontology Alignment Evaluation Initiative (OAEI).
Lu Zhou, Michelle Cheatham, Pascal Hitzler

### Autonomous RDF Stream Processing for IoT Edge Devices

Abstract
The wide adoption of increasingly cheap and computationally powerful single-board computers, has triggered the emergence of new paradigms for collaborative data processing among IoT devices. Motivated by the billions of ARM chips having been shipped as IoT gateways so far, our paper proposes a novel continuous federation approach that uses RDF Stream Processing (RSP) engines as autonomous processing agents. These agents can coordinate their resources to distribute processing pipelines by delegating partial workloads to their peers via subscribing continuous queries. Our empirical study in “cooperative sensing” scenarios with resourceful experiments on a cluster of Raspberry Pi nodes shows that the scalability can be significantly improved by adding more autonomous agents to a network of edge devices on demand. The findings open several new interesting follow-up research challenges in enabling semantic interoperability for the edge computing paradigm.
Manh Nguyen-Duc, Anh Le-Tuan, Jean-Paul Calbimonte, Manfred Hauswirth, Danh Le-Phuoc

### Certain Answers to a sparql Query over a Knowledge Base

Abstract
Ontology-Mediated Query Answering (OMQA) is a well-established framework to answer queries over an rdfs or owl Knowledge Base (KB). OMQA was originally designed for unions of conjunctive queries (UCQs), and based on certain answers. More recently, OMQA has been extended to sparql queries, but to our knowledge, none of the efforts made in this direction (either in the literature, or the so-called sparql entailment regimes) is able to capture both certain answers for UCQs and the standard interpretation of sparql over a plain graph. We formalize these as requirements to be met by any semantics aiming at conciliating certain answers and sparql answers, and define three additional requirements, which generalize to KBs some basic properties of sparql answers. Then we show that a semantics can be defined that satisfies all requirements for sparql queries with SELECT, UNION, and OPTIONAL, and for DLs with the canonical model property. We also investigate combined complexity for query answering under such a semantics over $$\textit{DL-Lite}_{\mathcal {R}}$$ KBs. In particular, we show for different fragments of sparql that known upper-bounds for query answering over a plain graph are matched.
Julien Corman, Guohui Xiao

### External Knowledge-Based Weakly Supervised Learning Approach on Chinese Clinical Named Entity Recognition

Abstract
Automatic extraction of clinical named entities, such as body parts, drugs and surgeries, has been of great significance to understand clinical texts. Deep neural networks approaches have achieved remarkable success in named entity recognition task recently. However, most of these approaches train models from large, high-quality and labor-consuming labeled data. In order to reduce the labeling costs, we propose a weakly supervised learning method for clinical named entity recognition (CNER) tasks. We use a small amount of labeled data as seed corpus, and propose a bootstrapping method integrating external knowledge to iteratively generate the labels for unlabeled data. The external knowledge consists of domain specific dictionaries as well as a bunch of handcraft rules. We conduct experiments on CCKS-2018 CNER task dataset and our approach achieves competitive results comparing to the supervised approach with fully labeled data.
Yeheng Duan, Long-Long Ma, Xianpei Han, Le Sun, Bin Dong, Shanshan Jiang

### Metadata Application Profile Provenance with Extensible Authoring Format and PAV Ontology

Abstract
Metadata application profiles (MAP) serve a critical role in the of metadata interoperability. Singapore framework recommends publishing the application profiles as documentation, with detailed usage guidelines aimed to maximize reusability and interoperability. Authoring, maintenance, versioning, and ensuring the availability of previous versions along with changelogs are vital steps involved in MAP publishing. The longevity of the schema is a critical part of metadata longevity. MAP should provide sufficient administrative information and versioning to ensure the provenance and longevity as a record of changes of the metadata instance. The authors propose to include actionable changelogs and provenance information within an extensible MAP authoring format. The proposal also includes a recommendation on MAP versioning and publishing with PAV, a lightweight ontology for Provenance, Authoring, and Versioning.
Nishad Thalhath, Mitsuharu Nagamori, Tetsuo Sakaguchi, Shigeo Sugimoto

### An Ontology-Based Development of Activity Knowledge and System Design

Abstract
This paper describes an ontology-based development of activity knowledge on a domain and the system we developed to support it. To understand human activities, it is important to explicitly describe the knowledge of each domain. However, there are some issues of knowledge development: the establishment of the efficient method and process, the improvement of the readability for humans and machines, and the regular improvement of knowledge after development. We thus introduced a process of knowledge development, which uses two different types of knowledge representation (activity knowledge and domain ontology) on a domain that requires technical skills. In this study, we practiced the process in the music field to investigate the effects of developing activity knowledge based on a domain ontology. The results showed that it enables deep understanding and extension of knowledge. Furthermore, we designed a system to help the ontology-based development of activity knowledge. We rewrote the activity knowledge using the system and received preliminary results on term control.
Nami Iino, Hideaki Takeda, Takuichi Nishimura

### Backmatter

Weitere Informationen