Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 4th China Conference on Knowledge Graph and Semantic Computing, CCKS 2019, held in Hangzhou, China, in August 2019. The 18 revised full papers presented were carefully reviewed and selected from 140 submissions. The papers cover wide research fields including the knowledge graph, the semantic Web, linked data, NLP, information extraction, knowledge representation and reasoning.



Adaptive Multilingual Representations for Cross-Lingual Entity Linking with Attention on Entity Descriptions

Cross-lingual entity linking is the task of resolving ambiguous mentions in text to corresponding entities in knowledge base, where the query text and knowledge base are in different languages. Recent multilingual embedding based methods bring significant progress in this task. However, they still meet some potential problems: (1) They directly use multilingual embeddings obtained by cross-lingual mapping, which may bring noise and degrade the performance; (2) They also rely on the pre-trained fixed entity embeddings, which only carry limited information about entities. In this paper, we propose a cross-lingual entity linking framework with the help of more adaptive representations. For the first problem, we apply trainable adjusting matrices to fine-tune the semantic representations built from multilingual embeddings. For the second problem, we introduce attention mechanisms on entity descriptions to obtain dynamic entity representations, exploiting more clues about entity candidates according to the query mentions. Experiments on the TAC KBP 2015 Chinese-English cross-lingual entity linking dataset show that our model yields better performance than state-of-the-art models.
Chenhao Wang, Yubo Chen, Kang Liu, Jun Zhao

Context-Dependent Representation of Knowledge Graphs

Recently, there is a growing interest of leveraging graph’s structural information for knowledge representation. However, they fail to capture global connectivity patterns in knowledge graphs or depict unique structural properties of various graph context. In this paper, we propose a novel representation framework, Context-dependent Representation of Knowledge Graphs (CRKG), to utilize the diversity of graph’s structural information for knowledge representation. We introduce triplet context to effectively capture semantic information from two types of graph structures around a triple. One is K-degree neighborhoods of a source entity in the target triple, which captures global connectivity patterns of entities. The other is multiple relation paths between the entity pair in the target triple, reflecting rich inference patterns between entities. Considering the unique characteristics of two kinds of triplet context, we design distinct embedding strategies to preserve their connectivity pattern diversities. Experimental results on three challenging datasets show that CRKG has significant improvements compared with baselines on link prediction task.
Binling Nie, Shouqian Sun

Construction and Application of Teaching System Based on Crowdsourcing Knowledge Graph

[Objective] Through the combination of crowdsourcing knowledge graph and teaching system, research methods to generate knowledge graph and its applications. [Method]Using two crowdsourcing approaches, crowdsourcing task distribution and reverse captcha generation, to construct knowledge graph in the field of teaching system. [Results] Generating a complete hierarchical knowledge graph of the teaching domain by nodes of school, student, teacher, course, knowledge point and exercise type. [Limitations] The knowledge graph constructed in a crowdsourcing manner requires many users to participate collaboratively with fully consideration of teachers’ guidance and users’ mobilization issues. [Conclusion] Based on the three subgraphs of knowledge graph, prominent teacher, student learning situation and suitable learning route could be visualized. [Application] Personalized exercises recommendation model is used to formulate the personalized exercise by algorithm based on the knowledge graph. Collaborative creation model is developed to realize the crowdsourcing construction mechanism. [Evaluation] Though unfamiliarity with the learning mode of knowledge graph and learners’ less attention to the knowledge structure, system based on Crowdsourcing Knowledge Graph can still get high acceptance around students and teachers.
Jinta Weng, Ying Gao, Jing Qiu, Guozhu Ding, Huanqin Zheng

Cross-Lingual Entity Linking in Wikipedia Infoboxes

Infoboxes in Wikipedia are valuable resources for extracting structured information of entities, several large-scale knowledge graphs are built by processing infobox data, including DBpedia, YAGO, etc. Entity links annotated by hyper-links in infoboxes are the keys for extracting entity relations. However, many entity links in infoboxes are missing because the mentioned entities do not exist in the current language versions of Wikipedia. This paper presents an approach for automatically linking mentions in infoboxes to their corresponding entities in another language, when the target entities are not in current language. Our approach first builds a cross-lingual mention-entity vocabulary from the cross-lingual links in Wikipedia, which is then used to generate cross-lingual candidate entities for mentions. After that, our approach performs entity disambiguation by using a cross-lingual knowledge graph embedding model. Experiments show that our approach can discover cross-lingual entity links with high accuracy.
Juheng Yang, Zhichun Wang

Incorporating Domain and Range of Relations for Knowledge Graph Completion

Knowledge graphs store facts as triples, with each containing two entities and one relation. Information of entities and relations are important for knowledge graph related tasks like link prediction. Knowledge graph embedding methods embed entities and relations into a continuous vector space and accomplish link prediction via calculation with embeddings. However, some embedding methods only focus on information of triples and ignore individual information about relations. For example, relations inherently have domain and range which will contribute much towards learning, even though sometimes they are not explicitly given in knowledge graphs. In this paper, we propose a framework TransX\(_C\) (X can be replaced with E, H, R or D) to help preserve individual information of relations, which can be applied to multiple traditional translation-based embedding methods (i.e. TransE, TransH, TransR and TransD). In TransX\(_C\), we use two logistic regression classifiers to model domain and range of relations respectively, and then we train the embedding model and classifiers jointly in order to include information of triples as well as domain and range of relations. The performance of TransX\(_C\) are evaluated on link prediction task. Experimental results show that our method outperforms the corresponding translation-based model, indicating the effectiveness of considering domain and range of relations into link prediction.
Juan Li, Wen Zhang, Huajun Chen

REKA: Relation Extraction with Knowledge-Aware Attention

Relation extraction (RE) is an important task and has wide applications. Distant supervision is widely used in RE methods which can automatically construct labeled data to reduce the manual annotation effort. This method usually results in many instances with incorrect labels. In addition, most of existing relation extraction methods merely rely on the textual content of sentences to extract relation. In fact, many knowledge graphs are off-the-shelf and they can provide useful information of entities and relations, which has the potential to alleviate the noisy data problem and improve the performance of relation extraction. In this paper, we propose a knowledge-aware attention model to incorporate the knowledge graph information into relation extraction. In our approach, we first learn the representations of entities and relations from knowledge graph using graph embedding methods. Then we propose a knowledge-aware word attention model to select the informative words in sentences for relation extraction. In addition, we also propose a knowledge-aware sentence attention model to select useful sentences for RE to alleviate the problem of noisy data brought by distant supervision. We conduct experiments on a widely used dataset and the results show that our approach can effectively improve the performance of neural relation extraction.
Peiyi Wang, Hongtao Liu, Fangzhao Wu, Jinduo Song, Hongyan Xu, Wenjun Wang

Research on Construction and Automatic Expansion of Multi-source Lexical Semantic Knowledge Base

With the development of research on improving the performance of deep learning models in combination with the rich knowledge resources in traditional knowledge bases, more and more research on building knowledge bases has become a hot topic. How to use the rich semantic information of existing knowledge bases such as HowNet and Tongyici-Cilin to build a more comprehensive and higher quality knowledge graph has become the focus of scholars’ research. In this work, we propose a way to integrate a variety of knowledge base information to build a new knowledge base, combined with deep learning techniques to expand the knowledge base. Successfully build a multi-source lexical semantic knowledge base through the steps of new ontology construction, data cleaning and fusion, and new knowledge expansion. Based on the establishment of the knowledge base, we use the graph database and JavaScript script to store and visualize the data separately. Through experiments, we obtained a lexical semantic knowledge base containing 153754 nodes, 1598356 triples and 137 relationships. It can provide accurate and convenient knowledge services, and can use a large number of semantic knowledge resources to support research on semantic retrieval, intelligent question answering system, semantic relationship extraction, semantic relevance calculation and ontology automatic construction [1].
Siqi Zhu, Yi Li, Yanqiu Shao

A Survey of Question Answering over Knowledge Base

Question Answering over Knowledge Base (KBQA) is a problem that a natural language question can be answered in knowledge bases accurately and concisely. The core task of KBQA is to understand the real semantics of a natural language question and extract it to match in the whole semantics of a knowledge base. However, it is exactly a big challenge due to variable semantics of natural language questions in a real world. Recently, there are more and more out-of-shelf approaches of KBQA in many applications. It becomes interesting to compare and analyze them so that users could choose well. In this paper, we give a survey of KBQA approaches by classifying them in two categories. Following the two categories, we introduce current mainstream techniques in KBQA, and discuss similarities and differences among them. Finally, based on this discussion, we outlook some interesting open problems.
Peiyun Wu, Xiaowang Zhang, Zhiyong Feng

Fast Neural Chinese Named Entity Recognition with Multi-head Self-attention

Named entity recognition (NER) is an important task in natural language processing. It is an essential step for many downstream tasks, such as relation extraction and entity linking which are important for knowledge graph building and application. Existing neural NER methods are usually based on the LSTM-CRF framework and its variants. However, since the LSTM network has high time complexity to compute, the efficiency of these LSTM-CRF based NER methods is usually unsatisfactory. In this paper, we propose a fast neural NER model for Chinese texts. Our approach is based on the CNN-SelfAttention-CRF architecture, where the convolutional neural network (CNN) is used to learn contextual character representations from local contexts, the multi-head self-attention network is used to learn contextual character representations from global contexts, and the conditional random fields (CRF) is used to jointly decode the labels of characters in a sentence. Since both CNN and self-attention network can be computed in parallel, our approach can have higher efficiency than those LSTM-CRF based methods. Extensive experiments on two benchmark datasets validate that our approach is more efficient than existing neural NER methods and can achieve comparable or even better performance on Chinese NER.
Tao Qi, Chuhan Wu, Fangzhao Wu, Suyu Ge, Junxin Liu, Yongfeng Huang, Xing Xie

A Practical Framework for Evaluating the Quality of Knowledge Graph

Knowledge graphs have become much large and complex during past several years due to its wide applications in knowledge discovery. Many knowledge graphs were built using automated construction tools and via crowdsourcing. The graph may contain significant amount of syntax and semantics errors that great impact its quality. A low quality knowledge graph produce low quality application that is built on it. Therefore, evaluating quality of knowledge graph is necessary for building high quality applications. Many frameworks were proposed for systematic evaluation of knowledge graphs, but they are either too complex to be practical or lacking of scalability to large scale knowledge graphs. In this paper, we conducted a comprehensive study of existing frameworks and proposed a practical framework for evaluating quality on “fit for purpose” of knowledge graphs. We first selected a set of quality dimensions and their corresponding metrics based on the requirements of knowledge discovery based on knowledge graphs through systematic investigation of representative published applications. Then we recommended an approach for evaluating each metric considering its feasibility and scalability. The framework can be used for checking the essential quality requirements of knowledge graphs for serving the purpose of knowledge discovery.
Haihua Chen, Gaohui Cao, Jiangping Chen, Junhua Ding

Entity Subword Encoding for Chinese Long Entity Recognition

Named entity recognition (NER) is a fundamental and important task in natural language processing area, which jointly predicts entity boundaries and pre-defined categories. For Chinese NER task, recognition of long entities has not been well addressed yet. When character sequences of entities become longer, Chinese NER becomes more difficult with existing character-based and word-based neural methods. In this paper, we investigate Chinese NER methods that operate on subword units and propose to recognize Chinese long entities based on subword encoding. Firstly, our method generates subword units on known entities, which prevents noisy information brought by Chinese word segmentation and eases the determination of long entity boundaries. Then subword-character mixed sequences of sentences are served as input into character-based neural methods to perform Chinese NER. We apply our method on iterated dilated convolutional neural networks (ID-CNNs) and conditional random fields (CRF) for entity recognition. Experimental results on the benchmark People’s Daily and Weibo datasets show that our subword-based method achieves significant performance on long entity recognition.
Changyu Hou, Meiling Wang, Changliang Li

AliMe KBQA: Question Answering over Structured Knowledge for E-Commerce Customer Service

With the rise of knowledge graph (KG), question answering over knowledge base (KBQA) has attracted increasing attention in recent years. Despite much research has been conducted on this topic, it is still challenging to apply KBQA technology in industry because business knowledge and real-world questions can be rather complicated. In this paper, we present AliMe-KBQA, a bold attempt to apply KBQA in the E-commerce customer service field. To handle real knowledge and questions, we extend the classic “subject-predicate-object (SPO)” structure with property hierarchy, key-value structure and compound value type (CVT), and enhance traditional KBQA with constraints recognition and reasoning ability. We launch AliMe-KBQA in the Marketing Promotion scenario for merchants during the “Double 11” period in 2018 and other such promotional events afterwards. Online results suggest that AliMe-KBQA is not only able to gain better resolution and improve customer satisfaction, but also becomes the preferred knowledge management method by business knowledge staffs since it offers a more convenient and efficient management experience.
Feng-Lin Li, Weijia Chen, Qi Huang, Yikun Guo

CN-StartEnd: A Chinese Event Base Recording Start and End for Everything

Start and end are very important attributes in knowledge graphs. The entities or relations in a knowledge graph often have their validity periods represented by start timestamps and end timestamps. For example, Obama’s birthday is the start time of his life and the departure time is the end of his president’s career. We need to refer to the start timestamps or end timestamps when dealing with temporal tasks such as temporal question answering. The existing Chinese knowledge graphs, with popular examples including CN-DBpedia, and PKU-PIE, contain some unprocessed start timestamps and end timestamps for their entities. While in Chinese, a large number of descriptions about the beginning or end of entities, relations and states lie in events. In this paper we introduce our work in constructing a Chinese event base which focus on start-events and end-events. We extract more than 3 million event-temporal cases from infoboxes and natural texts of Chinese encyclopedias. After selection and matching, these event-temporal cases are reconstructed into a large-scale knowledge base that incorporates over 2.3 million start-events and 700 thousand end-events. Events describing the same object and match our start-end templates are merged into more than 150 thousand start-end pairs. Dumps for CN-StartEnd are available at: http://​eventkg.​cn/​cn_​StartEnd.
Hualong Zhang, Liting Liu, Shuzhi Cheng, Wenxuan Shi

Overview of CCKS 2018 Task 1: Named Entity Recognition in Chinese Electronic Medical Records

The CCKS 2018 presented a named entity recognition (NER) task focusing on Chinese electronic medical records (EMR). The Knowledge Engineering Group of Tsinghua University and Yidu Cloud Beijing Technology Co., Ltd. provided an annotated dataset for this task, which is the only publicly available dataset in the field of Chinese EMR. Using this dataset, 69 systems were developed for the task. The performance of the systems showed that the traditional CRF and Bi-LSTM model were the most popular models for the task. The system achieved the highest performance by combining CRF or Bi-LSTM model with complex feature engineering, indicating that feature engineering is still indispensable. These results also showed that the performance of the task could be augmented with rule-based systems to determine clinical named entities.
Jiangtao Zhang, Juanzi Li, Zengtao Jiao, Jun Yan

A Conditional VAE-Based Conversation Model

The recent sequence-to-sequence with attention (S2SA) model achieves high generation quality in modeling open-domain conversations. However, it often generates generic and uninformative responses. By incorporating abstract features drawn from a latent variable into the attention block, we propose a Conditional Variational Auto-encoder based neural conversation model that directly models a conversation as a one-to-many problem. We apply the proposed model on two datasets and compare with recent neural conversation models on automatic evaluation metrics. Experimental results demonstrate that the proposed model can generate more diverse, informative and interesting responses.
Junfan Chen, Richong Zhang, Yongyi Mao, Binfeng Wang, Jianhang Qiao

Emerging Entity Discovery Using Web Sources

The rapidly increasing amount of entities in knowledge bases (KBs) can be beneficial for many applications, where the key issue is to link entity mentions in text with entities in the KB, also called entity linking (EL). Many methods have been proposed to tackle this problem. However, the KB can never be complete, such that emerging entity discovery (EED) is essential for detecting emerging entities (EEs) that are mentioned in text but not yet contained in the KB. In this paper, we propose a new topic-driven approach to EED by representing EEs using the context harvested from online Web sources. Experimental results show that our solution outperforms the state-of-the-art methods in terms of F1 measure for the EED task as well as Micro Accuracy and Macro Accuracy in the full EL setting.
Lei Zhang, Tianxing Wu, Liang Xu, Meng Wang, Guilin Qi, Harald Sack

Named Entity Recognition for Open Domain Data Based on Distant Supervision

Named Entity Recognition (NER) for open domain data is a critical task for the natural language process applications and attracts many research attention. However, the complexity of semantic dependencies and the sparsity of the context information make it difficult for identifying correct entities from the corpus. In addition, the lack of annotated training data makes impossible the prediction of fine-grained entity types for detected entities. To solve the above-mentioned problems in NER, we propose an extractor which takes both the near arguments and long dependencies of relations into consideration for the entities and relations mention discovery. We then employ distant-supervision methods to automatically label mention types of training data sets and a neural network model is proposed for learning the type classifier. Empirical studies on two real-world raw text corpus, NYT and YELP, demonstrate that our proposed NER approach outperforms the existing models.
Junshuang Wu, Richong Zhang, Ting Deng, Jinpeng Huai

Geography-Enhanced Link Prediction Framework for Knowledge Graph Completion

Knowledge graphs contain knowledge about the world and provide a structured representation of this knowledge. Current knowledge graphs contain only a small subset of what is true in the world. Link prediction approaches aim at predicting new links for a knowledge graph given the existing links among the entities. Recent years have witnessed great advance of representation learning (RL) based link prediction models, which represent entities and relations as elements of a continuous vector space. However, the current representation learning models ignore the abundant geographic information implicit in the entities and relations, and therefore there is still room for improvement. To overcome this problem, this paper proposes a novel link prediction framework for knowledge graph completion. By leveraging geographic information to generate geographic units and rules, we construct geographic constraints for optimizing and boosting the representation learning results. Extensive experiments show that the proposed framework improves the performance of the current representation learning models for link prediction task.
Yashen Wang, Huanhuan Zhang, Haiyong Xie


Weitere Informationen

Premium Partner