main-content

## Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 9th Joint International Semantic Technology Conference, JIST 2019, held in Hangzhou, China, in November 2019.

The 12 full papers and 12 short papers presented were carefully reviewed and selected from 70 submissions. The papers present applications of semantic technologies, theoretical results, new algorithms and tools to facilitate the adoption of semantic technologies.

## Inhaltsverzeichnis

### Building a Large-Scale Knowledge Graph for Elementary Education in China

Abstract
With the penetration of information technology into all areas of society, Internet-assisted education has become an important opportunity for current educational reform. In order to better assist in teaching and learning, help students deepen their understanding and absorption of knowledge. We build a knowledge graph for elementary education, firstly, we define elementary education ontology, divide the knowledge graph into three sub-graphs. Then extracting concept instance and relation instance form textbook and existing knowledge base based on unsupervised method. In addition, we have acquired four different learning resources to assist in learning. At last, the results show that the procedure we proposed is scientific and efficient.
Wei Zheng, Zhichun Wang, Mingchen Sun, Yanrong Wu, Kaiman Li

### A Temporal Semantic Search System for Traditional Chinese Medicine Based on Temporal Knowledge Graphs

Abstract
Traditional Chinese medicine (TCM) is an important intangible cultural heritage of China. To enhance the services of TCM, many works focus on constructing various types of TCM knowledge graphs according to the concrete requirements such as information retrieval. However, most of them ignored several key issues. One is temporal information that is very important for TCM clinical diagnosis and treatment. For example, a herb needs to be boiled for different periods in different prescriptions, but existing methods cannot represent this temporal information very well. The other is that current TCM-based retrieval systems cannot effectively deal with the temporal intentions of search sentences, which leads to bad experiences for users in retrieval services. To solve these issues, we propose a new model tailored for TCM based on the temporal knowledge graph in this paper, which can effectively represent the clinical knowledge changing dynamically over time. Moreover, we implement a temporal semantic search system and employ reasoning rules based on our proposed model to complete the temporal intentions of search sentences. The preliminary result indicates that our system can obtain better results than existing methods in terms of precision.
Chengbiao Yang, Weizhuo Li, Xiaoping Zhang, Runshun Zhang, Guilin Qi

### Testing of Various Approaches for Semiautomatic Parish Records Word Standardization

Abstract
This paper deals with the clustering of words from parish records. Clustering is essential for downstream standardization. In the past, mostly in the 17th and beginning of 18th century, the names did not have a standard form, thus for further work, it is essential to create clusters of words that have the same meaning. Besides names, the parish records are in the various languages - Czech, German, Latin, so if we want to have relations between people, occupations, or causes of death in one language, we need to standardize it.
The first step of standardization is pre-processing, then we compare words, and the last step is classification into clusters. The most crucial step here is a comparison of words. We have tested various approaches like Levenshtein distance and its modifications, Q-grams, Jaro-Winkler, and phonetic codings like Soundex and Double-Metaphone. All those methods have been with types of words that can appear in parish records -first and last names, occupation, village, and relationship between people. From these tests, we have chosen the most suitable ways for the clustering of different types of words.
Jaroslav Rozman, David Hříbek, František Zbořil

### Concept Similarity Under the Agent’s Preferences for the Description Logic

Abstract
Computing the degree of concept similarity is an essential problem in description logic ontologies as it has contributions in various applications. However, many computational approaches to concept similarity do not take into account the logical relationships defined in an ontology. Moreover, they cannot be personalized to subjective factors (i.e. the agent’s preferences). This work introduces a computational approach to concept similarity for the description logic $$\mathcal {A}\!\mathcal {L}\!\mathcal {E}\!\mathcal {H}$$. Our approach computes the degree of similarity between two concept descriptions structurally under the agent’s preferences. Hence, the derived degree is analyzed based on the logical definitions defined in an ontology. We also illustrate its applicability in rice disease detection, in which a farmer queries for relevant disease based on an agricultural observation.
Teeradaj Racharak, Watanee Jearanaiwongkul, Chutiporn Anutariya

### Data Quality for Deep Learning of Judgment Documents: An Empirical Study

Abstract
The revolution in hardware technology has made it possible to obtain high-definition data through highly sophisticated algorithms. Deep learning has emerged and is widely used in various fields, and the judicial area is no exception. As the carrier of the litigation activities, the judgment documents record the process and results of the people’s courts, and their quality directly affects the fairness and credibility of the law. To be able to measure the quality of judgment documents, the interpretability of judgment documents has been an indispensable dimension. Unfortunately, due to the various uncontrollable factors during the process, such as data transmission and storage, The data set for training usually has a poor quality. Besides, due to the severe imbalance of the distribution of case data, data augmentation is essential to generate data for low-frequency cases. Based on the existing data set and the application scenarios, we explore data quality issues in four areas. Then we systematically investigate them to figure out their impact on the data set. After that, we compare the four dimensions to find out which one has the most considerable damage to the data set.
Jiawei Liu, Dong Wang, Zhenzhen Wang, Zhenyu Chen

### Aligning Sentences Between Comparable Texts of Different Styles

Abstract
Monolingual parallel corpus is crucial for training and evaluating text rewriting or paraphrasing models. Aligning parallel sentences between two large body of texts is a key step toward automatic construction of such parallel corpora. We propose a greedy alignment algorithm that makes use of strong unsupervised similarity measures. The algorithm aligns sentences with state-of-the-art accuracy while being more robust on corpora with special linguistic features. Using this alignment algorithm, we automatically constructed a large English parallel corpus from various translated works of classic literature.
Xiwen Chen, Mengxue Zhang, Kenny Qili Zhu

### An In-depth Analysis of Graph Neural Networks for Semi-supervised Learning

Abstract
Graph Neural Networks have experienced a rapid development in the last few years and become powerful tools for many machine learning tasks in graph domain. Graph Convolution Network is a breakthrough and become a strong baseline for node classification task. To this end, we perform a thorough experiment for several prominent GCN-related models, including GAT, AGNN, Co-Training GCN and Stochastic GCN. We found that different models take their advantages in different scenarios, depending on training set size, graph structure and datasets. Through our in-depth analysis of attention mechanism, dataset splits and the preprocessing for knowledge graphs, we report some interesting findings. And we look into GCNs for knowledge graphs carefully, then propose a new scheme for data processing, which achieves a better performance compared to traditional methods.
Yuyan Chen, Sen Hu, Lei Zou

### XTransE: Explainable Knowledge Graph Embedding for Link Prediction with Lifestyles in e-Commerce

Abstract
In e-Commerce, we are interested in deals by lifestyle which will improve the diversity of items shown to users. A lifestyle, an important motivation for consumption, is a person’s pattern of living in the world as expressed in activities, interests, and opinions. In this paper, we focus on the key task for deals by lifestyle, establishing linkage between items and lifestyles. We build an item-lifestyle knowledge graph to fully utilize the information about them and formulate it as a knowledge graph link prediction task. A lot of knowledge graph embedding methods are proposed to accomplish relational learning in academia. Although these methods got impressive results on benchmark datasets, they can’t provide insights and explanations for their prediction which limit their usage in industry. In this scenario, we concern about not only linking prediction results, but also explanations for predicted results and human-understandable rules, because explanations help us deal with uncertainty from algorithms and rules can be easily transferred to other platforms. Our proposal includes an explainable knowledge graph embedding method (XTransE), an explanation generator and a rule collector, which outperforms traditional classifier models and original embedding method during prediction, and successfully generates explanations and collects meaningful rules.
Wen Zhang, Shumin Deng, Han Wang, Qiang Chen, Wei Zhang, Huajun Chen

### Feasibility Study: Rule Generation for Ontology-Based Decision-Making Systems

Abstract
Ontology-based systems can offer enticing benefits for autonomous vehicle applications. One such system is an ontology-based decision-making system. This system takes advantage of highly abstracted semantic knowledge that describes the state of the vehicle as well as the state of its environment. Knowledge on scenario state combined with a set of logical rules is then used to determine correct actions for the vehicle. However, creating a set of rules for this safety-critical application is a challenging problem which must be solved to enable the use of the decision-making system in practical applications. This work explores the feasibility of generating rules for the reasoning system through machine learning. We propose a process for the rule generation and create a set of rules describing vehicle behavior in an uncontrolled four-way intersection.
Juha Hovi, Ryutaro Ichise

### Attention-Based Direct Interaction Model for Knowledge Graph Embedding

Abstract
Knowledge graph embedding aims at learning low-dimensional representations for entities and relations in knowledge graph. Previous knowledge graph embedding methods usually assign a score to each triple in order to measure the plausibility of it. Despite of the effectiveness of these models, they ignore the fine-grained (matching signals between entities and relations) clues since their scores are mainly obtained by manipulating the triple as a whole. To address this problem, we instead propose a model which firstly produces diverse features of entity and relation by multi-head attention and then introduces the interaction mechanism to incorporate matching signals between entities and relations. Experiments show that our model achieves better link prediction performance than multiple strong baselines on two benchmark datasets WN18RR and FB15k-237.
Bo Zhou, Yubo Chen, Kang Liu, Jun Zhao

### Discovering Hypernymy Relationships in Chinese Traffic Legal Texts

Abstract
Currently, Knowledge Graph is playing a crucial rule in some knowledge-based applications, such as semantic search and data integration. Due to the particularity of the vocabulary and language pattern in the Chinese legal domain, the exploration of hierarchical legal knowledge structures is still challenging. In this paper, we first explore a combination of pattern-based and linguistic-rule-based approach in helping experts to identify hypernymy relationships in large-scale traffic legal corpus. Using these relationships as ground truths, we then propose a supervised hypernymy classification of candidate term pairs using an attention-based bidirectional LSTM model, in which a global context of each candidate is defined as the feature for classification. We compare the performance of our approach with state-of-art baselines on real-world data. The evaluation results show that our approach is quite effective in finding Chinese hypernym-hyponym in the traffic legal domain.
Peng Gao, Xiang Zhang, Guilin Qi

### Multi-task Learning for Attribute Extraction from Unstructured Electronic Medical Records

Abstract
Electronic medical records have been widely used in hospitals to store patient information in a digital format, which is convenient to reuse the patient’s medical data and make it become the data of teaching and scientific research. It is also convenient to analyze and mine the patient’s data, so as to provide the basis for medical research. However, most of the existing methods are based on structured data of electronic medical records, and researches on unstructured texts are very rare, which would lose a lot of important information. In this paper, we focus on attribute extraction from the unstructured text of electronic medical records, and propose a multi-task learning model to jointly learn related tasks to help improve the generalization performance of all the tasks. Specifically, we use an end-to-end neural network model to extract different attribute values from the same unstructured text. We take each sentence/segment of the text as an instance. For each instance, we first use the pre-trained word embedding to better initialize our neural network models, then we fine-tune them by using our domain corpus to capture domain specific semantics/knowledge. Considering that the importance of different instances for attribute extractors is not equal, we also use an attention mechanism to select the most important instances for those attribute extractors. Finally, our model use multi-task learning by solving multiple multi-class classification problems simultaneously. Experimental results show the effectiveness of our method.
Ming Du, Minmin Pang, Bo Xu

### Uncertain Ontology-Aware Knowledge Graph Embeddings

Abstract
Much attention has recently been given to knowledge graphs embedding by exploiting latent and semantic relations among entities and incorporating the structured knowledge they contain into machine learning. Most of the existing graph embedding models can only encode a simple model of the data, while few models are designed for ontology rich knowledge graphs. Furthermore, many automated knowledge construction tools produce modern knowledge graphs with rich semantics and uncertainty. However, there is no graph embedding model which includes uncertain ontological information into graph embedding models. In this paper, we propose a novel embedding model UOKGE (Uncertain Ontology-aware Knowledge Graph Embeddings), which learns embeddings of entities, classes, and properties on uncertain ontology-aware knowledge graphs according to confidence scores. The proposed method preserves both structures and uncertainty of knowledge in the embedding space. Specifically, UOKGE encodes each entity in a knowledge graph as a point of n-dimensional vector, each class as a n-sphere and each property as 2n-sphere in the same semantic space. This representation allows for the natural expression of uncertain ontological triples. The preliminary experimental results show that UOKGE can robustly learn representations of uncertain ontology-aware knowledge graphs when evaluated on a benchmark dataset.
Khaoula Boutouhami, Jiatao Zhang, Guilin Qi, Huan Gao

### Investigating Schema Definitions Using RDFS and OWL 2 for RDF Databases in Life Sciences

Abstract
With the development of measuring instruments, life science databases are becoming larger and more heterogeneous. As a step towards providing integrated databases, many life science databases have been published as Linked Open Data (LOD). To utilize such databases efficiently, it is desirable that the schema, such as class–class relations, can be acquired in advance from SPARQL Protocol and RDF Query Language (SPARQL) endpoints. However, a SPARQL query to obtain the schema from a SPARQL endpoint often fails because it is time consuming and places an excessive load on the server. On the other hand, many datasets include the definitions using standard vocabularies, such as RDF Schema 1.1 and OWL 2. If the database schema is properly described and provided using RDF Schema 1.1 or OWL 2, it is no longer necessary to obtain it by exhaustively crawling the SPARQL endpoints. Therefore, we investigated the extent of the schema definitions in life sciences databases, focusing on seven specific patterns related to properties using RDF Schema 1.1 or OWL 2. We found that for some datasets, the patterns of domain and range definitions using RDF Schema 1.1, are relatively well defined for properties. However, there are few patterns using OWL 2 as schema definitions for properties. Additionally, we validated RDF datasets by restricting the patterns of domain and range definitions of RDF Schema 1.1. Subsequently, we found that RDF datasets follow these restrictions.
Atsuko Yamaguchi, Tatsuya Kushida, Yasunori Yamamoto, Kouji Kozaki

### RQE: Rule-Driven Query Expansion to Solve Empty Answers in SPARQL

Abstract
A branch of question answering approaches translates natural language questions to SPARQL queries. The empty answer problem exists even when we have properly-translated ones, due to the heterogeneity and incompleteness of knowledge graphs. Existing methods use similarities, ontologies or embeddings to relax failed queries and obtain approximate answers, but they may lose efficacy in approximating simple queries with only one or two constraints because of their low accuracy and suitability for over-constrained ones. In this paper, we propose a rule-driven query expansion approach to expand failed queries for obtaining more accurate approximate answers. Specifically, we first automatically build high-quality rule sets for predicates in failed queries with rule learning techniques. Then, we use the learned rules to expand failed queries to get approximate answers and explain the reasons why we choose these answers. We develop two datasets to evaluate the effectiveness and efficiency of our approach and the results show that our approach achieves better results than several approaches based on similarities, ontologies and embeddings in approximating simple queries.
Xinze Lyu, Wei Hu

### Aspect-Level Sentiment Analysis of Online Product Reviews Based on Multi-features

Abstract
Aspect-level sentiment analysis aims to identify the sentiment polarity of fine-grained opinion targets. Existing methods are usually performed on structured standard datasets. We propose a model for a specific dataset which has a complex structure. First, we utilize some matching rules to extract implicit aspects, then we use the extracted aspect words to segment the corpus into samples. Finally, we propose a set of methods to construct data-based features, and try to fuse multi-features for classifier training. Experiments show that the method integrated three features has the highest F1 score, and the sentiment analysis results are more accurate.
Binhui Wang, Ruiqi Wang, Shujun Liu, Yanyu Chai, Shusong Xing

### A Seq2seq-Based Approach to Question Answering over Knowledge Bases

Abstract
Semantic parsing, as an essential approach to question answering over knowledge bases KBQA), transforms a question into query graphs for further generating logical queries. Existing semantic parsing approaches in KBQA mainly focus on relations (called local semantics) with paying less attention to the relationship among relations (called global semantics). In this paper, we present a seq2seq-based semantic parsing approach to improving performance of KBQA by converting the identification problem of question types to the problem of machine translation. Firstly, we introduce a BiLSTM-based named entity recognition (NER) method to extract all classes of entities occurring in questions. Secondly, we present an attention-based seq2seq model to learn one type of a question by applying seq2seq model in extracting relationships among classes. Finally, we generate templates to adopt more question types for matching more complex questions. The experimental results on a real knowledge base Chinese film show that our approach outperforms the existing template matching model.
Linjuan Wu, Peiyun Wu, Xiaowang Zhang

### Building Knowledge Graph Across Different Subdomains Using Interlinking Ontology for Biomedical Concepts

Abstract
This paper proposes a method for building knowledge graphs across different subdomains in life science using Interlinking Ontology for Biological Concepts (IOBC). IOBC provides wide range of concepts related to biomedical domains with relationships between concepts across different subdomains. The proposed method obtains some relationships according to interests of the users. Then, it combines these relationships with mappings from related concepts to other RDF datasets and construct new knowledge graphs using them. This paper introduces the building method which consist of 5 steps with some results of trial constructions of knowledge graphs.
Kouji Kozaki, Tatsuya Kushida, Yasunori Yamamoto, Toshihisa Takagi

### WPQA: A Gaming Support System Based on Machine Learning and Knowledge Graph

Abstract
Honor of Kings is a multiplayer online battle arena game in which two teams fight with each other with five players controlling five different heroes on each side. By 2017, Honor of Kings has over 80 million daily active players and 200 million monthly active players and was both the world’s most popular and highest-grossing game of all time as well as the most downloaded gaming app globally. In this paper, we will introduce a prediction model based on a machine learning algorithm to forecast the victory of Honor of Kings 5V5 game by considering the heroes formation on each side using a gaming history dataset.
Luwei Wang, Yan Tang, Jie Liu

### Combining Concept Graph with Improved Neural Networks for Chinese Short Text Classification

Abstract
With the development of the Internet, network information is booming, and a large amount of short text data has brought more timely and comprehensive information to people. How to find the required information quickly and accurately from these pieces of information is the focus of the industry. Short text processing is one of the key technologies. Because of the sparse and noisy features of short texts, the traditional classification method can not provide good support. At present, the research on short text classification mainly focuses on two aspects: feature processing and classification algorithm. Most feature processing methods only use text literal information when performing feature expansion, which lacks the ability to discriminate the polysemy that is common in Chinese. In the classification algorithm, there are also problems such as insufficient input characteristics and insufficient classification effect. In order to improve the accuracy of Chinese short text classification, this paper proposes a method of Chinese short text classification based on improved convolutional recurrent neural network and concept graph, which achieves better classification results than existing algorithms.
Jialu Liao, Fanke Sun, Jinguang Gu

### Construction of Chinese Pediatric Medical Knowledge Graph

Abstract
The knowledge graph is a promising method for knowledge management in the big data era. Pediatrics, as an essential branch of clinical medicine, has accumulated a large amount of medical data. This paper applies the knowledge graph technique in pediatric studies and proposes a method for Chinese pediatric medical knowledge graph (PMKG) construction. The proposed method has a conceptual layer and a data layer. At the conceptual layer we analyze the semantic characteristics of multi-source pediatrics data, formulate the annotation scheme of entity and entity relationship, and extend the traditional triplet form of knowledge graph to a sextuplet form. At the data layer, guided by the annotation scheme, information is extracted from data sources using entity recognition and relationship extraction. Manual annotation, knowledge fusion and other technologies are used to construct a pediatric knowledge graph. The PMKG contains 22,023 entities and 34,434 sextuplets.
Yu Song, Linkun Cai, Kunli Zhang, Hongying Zan, Tao Liu, Xiaohui Ren

### EasyKG: An End-to-End Knowledge Graph Construction System

Abstract
We present an end-to-end system, called EasyKG, throughout the whole lifecycle of knowledge graph (KG) construction. It has a pluggable pipeline architecture containing the components of knowledge modeling, knowledge extraction, knowledge reasoning, knowledge management and so forth. Users can automatically generate such a pipeline so as to obtain a domain-specific KG. Advanced users are allowed to create a pipeline in a drag-and-drop manner with customized components. EasyKG lowers the barriers of KG construction. Moreover, EasyKG allows users to evaluate different components and KGs, and share them across different domains so as to further reduce the cost of construction.
Yantao Jia, Dong Liu, Zhicheng Sheng, Letian Feng, Yi Liu, Shuo Guo

### Backmatter

Weitere Informationen