Skip to main content

About this book

This book constitutes the refereed proceedings of the first China Conference on Knowledge Graph and Semantic Computing, CCKS, held in Beijing, China, in September 2016.

The 19 revised full papers presented together with 6 shared tasks were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on knowledge representation and learning; knowledge graph construction and information extraction; linked data and knowledge-based systems; shared tasks.

Table of Contents


Knowledge Representation and Learning


A Joint Embedding Method for Entity Alignment of Knowledge Bases

We propose a model which jointly learns the embeddings of multiple knowledge bases (KBs) in a uniform vector space to align entities in KBs. Instead of using content similarity based methods, we think the structure information of KBs is also important for KB alignment. When facing the cross-linguistic or different encoding situation, what we can leverage are only the structure information of two KBs. We utilize seed entity alignments whose embeddings are ensured the same in the joint learning process. We perform experiments on two datasets including a subset of Freebase comprising 15 thousand selected entities, and a dataset we construct from real-world large scale KBs – Freebase and DBpedia. The results show that the proposed approach which only utilize the structure information of KBs also works well.
Yanchao Hao, Yuanzhe Zhang, Shizhu He, Kang Liu, Jun Zhao

A Multi-dimension Weighted Graph-Based Path Planning with Avoiding Hotspots

With the development of industrialization rapidly, vehicles have become an important part of people’s life. However, transportation system is becoming more and more complicated. The core problem of the complicated transportation system is how to avoid hotspots. In this paper, we present a graph model based on a multi-dimension weighted graph for path planning with avoiding hotspots. Firstly, we extend one-dimension weighted graphs to multi-dimension weighted graphs where multi-dimension weights are used to characterize more features of transportation. Secondly, we develop a framework equipped with many aggregate functions for transforming multi-dimension weighted graphs into one-dimension weighted graphs in order to converse the path planning of multi-dimension weighted graphs into the shortest path problem of one-dimension weighted graphs. Finally, we implement our proposed framework and evaluate our system in some necessary practical examples. The experiment shows that our approach can provide “optimal” paths under the consideration of avoiding hotspots.
Shuo Jiang, Zhiyong Feng, Xiaowang Zhang, Xin Wang, Guozheng Rao

Position Paper: The Unreliability of Language - A Common Issue for Knowledge Engineering and Buddhism

According to the studies of Kurt Gödel and Ludwig Wittgenstein, both of formal languages and human languages are unreliable. This finding inherently influences the development of artificial intelligence and knowledge engineering. On the other hand, their finding, i.e., the unreliability of languages, was early discussed by Gautama Buddha who founded Buddhism. In this paper, we discuss the issue of the unreliability of language by bridging the perspectives of Gödel, Wittgenstein and Gautama. Based on the discussion, we further give some philosophical thoughts from the perspective of knowledge engineering.
Zhangquan Zhou, Guilin Qi

Construction of Domain Ontology for Engineering Equipment Maintenance Support

According to the problem in the domain of engineering equipment maintenance, such as more knowledge points, broad scope, complex relationships, difficult in sharing and reuse, this paper put forward the category and professional field of engineering equipment maintain ontology, and analyzed knowledge source, extracted eight core concepts such as case, product, function, damage, environment, phenomena, disposal and resource, and formed concept hierarchy model further, and then analyzed data properties and object properties of core concepts, and tried to construct the engineering equipment maintain ontology with protege4.3, which put a solid foundation for the knowledge base and engineering equipment maintenance application ontology.
YongHua Zeng, JianDong Zhuang, ZhengLian Su

Knowledge Graph Construction and Information Extraction


Boosting to Build a Large-Scale Cross-Lingual Ontology

The global knowledge sharing makes large-scale multi-lingual knowledge bases an extremely valuable resource in the Big Data era. However, current mainstream Wikipedia-based multi-lingual ontologies still face the following problems: the scarcity of non-English knowledge, the noise in the multi-lingual ontology schema relations and the limited coverage of cross-lingual owl:sameAs relations. Building a cross-lingual ontology based on other large-scale heterogenous online wikis is a promising solution for those problems. In this paper, we propose a cross-lingually boosting approach to iteratively reinforce the performance of ontology building and instance matching. Experiments output an ontology containing over 3,520,000 English instances, 800,000 Chinese instances, and over 150,000 cross-lingual instance alignments. The F1-measure improvement of Chinese instanceOf prediction achieve the highest 32%.
Zhigang Wang, Liangming Pan, Juanzi Li, Shuangjie Li, Mingyang Li, Jie Tang

Large Scale Semantic Relation Discovery: Toward Establishing the Missing Link Between Wikipedia and Semantic Network

Wikipedia has been the largest knowledge repository on the Web. However, most of the semantic knowledge in Wikipedia is documented in natural language, which is mostly only human readable and incomprehensible for computer processing. To establish the missing link from Wikipedia to semantic network, this paper proposes a relation discovery method, which can: (1) discover and characterize a large collection of relations from Wikipedia by exploiting the relation pattern regularity, the relation distribution regularity and the relation instance redundancy; and (2) annotate the hyperlinks between Wikipedia articles with the discovered semantic relations. Finally we discover 14,299 relations, 105,661 relation patterns and 5,214,175 relation instances from Wikipedia, and this will be a valuable resource for many NLP and AI tasks.
Xianpei Han, Xiliang Song, Le Sun

Biomedical Event Trigger Detection Based on Hybrid Methods Integrating Word Embeddings

Trigger detection as the preceding task is of great importance in biomedical event extraction. By now, most of the state-of-the-art systems have been based on single classifiers, and the words encoded by one-hot are unable to represent the semantic information. In this paper, we utilize hybrid methods integrating word embeddings to get higher performance. In hybrid methods, first, multiple single classifiers are constructed based on rich manual features including dependency and syntactic parsed results. Then multiple predicting results are integrated by set operation, voting and stacking method. Hybrid methods can take advantage of the difference among classifiers and make up for their deficiencies and thus improve performance. Word embeddings are learnt from large scale unlabeled texts and integrated as unsupervised features into other rich features based on dependency parse graphs, and thus a lot of semantic information can be represented. Experimental results show our method outperforms the state-of-the-art systems.
Lishuang Li, Meiyue Qin, Degen Huang

GRU-RNN Based Question Answering Over Knowledge Base

Building system that could answer questions in natural language is one of the most important natural language processing applications. Recently, the raise of large-scale open-domain knowledge base provides a new possible approach. Some existing systems conduct question-answering relaying on hand-craft features and rules, other work try to extract features by popular neural networks. In this paper, we adopt recurrent neural network to understand questions and find out the corresponding answer entities from knowledge bases based on word embedding and knowledge bases embedding. Question-answer pairs are used to train our multi-step system. We evaluate our system on FREEBASE and WEBQUESTIONS. The experimental results show that our system achieves comparable performance compared with baseline method with a more straightforward structure.
Shini Chen, Jianfeng Wen, Richong Zhang

Towards Personal Relation Extraction Based on Sentence Pattern Tree

Extracting personal relation triple (S, P, O) from large number of unstructured text is crucial to the construction of knowledge graph, knowledge representation and reasoning of personal relation. Aiming at low accuracy in extracting triples from unstructured text, we present a supervised approach to judge whether extracted triples are correct. The approach need to build a knowledge base which contain peoples attributes first, then a sentence pattern tree is learnt according the people attribute knowledge base and the training data. When training, triples are extracted from the text automatically and labelled whether correct or not manually. Then patterns are constructed layer-by-layer according the position of “triple”, “pronoun” and “word” in sentence. At the same time, the correct and error number of triples are recorded on each pattern. When testing, the correctness of triples can be judged by the number recorded in matched patterns. According the test result, our approach does better in the training time, the testing time and the F1-value (76.6%) than the ordinary approach based on feature engineering (75.7%). At last, we make the judgement result of sentence pattern tree as a feature to improve the feature engineering approach (77.5%). In addition, this approach has a better expansibility than the traditional one and has guiding significance to the construction of the training set.
Zhao Jiapeng, Yan Yang, Liu Tingwen, Shi Jinqiao

An Initial Ingredient Analysis of Drugs Approved by China Food and Drug Administration

Drug is an important part of medicine. Drug knowledge bases that organize and manage drugs have attracted considerable attention, and have been widely used in human health care in many countries and regions. There are also a large number of electronic drug knowledge bases publicly available. In China, however, there is hardly any publicly available well-structured drug knowledge base, may due to two different types of medicine: Chinese traditional medicine (CTM) and modern medicine (ME). In order to build an electronic knowledge base of drugs approved by China Food and Drug Administration (CFDA), we developed a preliminary ingredient drug analysis system. This system collects all drug names from the website of CFDA, obtains their manuals from three medical websites, extracts the ingredients of drugs, and analyses the distribution of the extracted ingredients. Totally, 12,918 out of 19,490 drug manuals were collected. Evaluation on randomly selected 50 drug manuals shows that the system achieves an F-score of 95.46% on ingredient extraction. According to the distribution of the extraction ingredients, we find that ingredient multiplexing is very common in medicine, especially in herbal medicine, which may provide a clue for drug safety as taking more than one type of drug that contains partially the same ingredients may cause overtaking the same ingredients.
Haodi Li, Qingcai Chen, Buzhou Tang, Dong Huang, Xiaolong Wang, Zengjian Liu

A Tableau-Based Forgetting in ALCQ

Forgetting is a useful tool for tailoring ontologies by reducing the number of concepts and roles. The issue of forgetting for general ontologies in more expressive description logics, such as \(\mathcal {ALCQ}\) and \(\mathcal {SHIQ}\), is largely unexplored. In this paper, we develop a decidable, sound, and complete tableau-based algorithm to implement the forgetting-based reasoning. Our tableau algorithm is technically feasibly extended to explore the forgetting in more expressive ontology languages.
Hong Fang, Xiaowang Zhang

Mining RDF Data for OWL2 RL Axioms

The large amounts of linked data are a valuable resource for the development of semantic applications. However, these applications often meet the challenges posed by flawed or incomplete schema, which would lead to the loss of meaningful facts. Association rule mining has been applied to learn many types of axioms. In this paper, we first use a statistical approach based on the association rule mining to enrich OWL ontologies. Then we propose some improvements according to this approach. Finally, we describe the quality of the acquired axioms by evaluations on DBpedia datasets.
Yuanyuan Li, Huiying Li, Jing Shi

A Mixed Method for Building the Uyghur and Chinese Domain Ontology

The study of multilingual ontology on professional field is relatively rare, and a few of the many existing are about the public domain. This paper describes the mixed method for building a new multilingual ontology. By using the above mixed method, construct UC bilingual ontology about University Management field, through alignment and mapping the concepts and relations between the different language ontology then merging into one body - multilingual ontology. Finally, preliminary realized semantic query using SPARQL, so that can will provide basic support for minority languages cross-lingual retrieval from the perspective of the professional field.
Yilahun Hankiz, Imam Seyyare, Hamdulla Askar

Linked Data and Knowledge-Based Systems


Link Prediction via Mining Markov Logic Formulas to Improve Social Recommendation

Social networks have been a main way to obtain information in recent years, but the huge amount of information obstructs people from obtaining something that they are really interested in. Social recommendation system is introduced to solve this problem and brings a new challenge of predicting peoples preferences. In a graph view, social recommendation can be viewed as link prediction task on the social graph. Therefore, some link prediction technique can apply to social recommendation. In this paper, we propose a novel approach to bring logic formulas in social recommendation system and it can improve the accuracy of recommendations. This approach is made up of two parts: (1) It treats the whole social network with kinds of attributes as a semantic network, and finds frequent structures as logic formulas via random graph algorithms. (2) It builds a Markov Logic Network to model logic formulas, attaches weights to each of them to measure formulas contributions, and then learns the weights discriminatively from training data. In addition, the formulas with weights can be viewed as the reason why people should accept a specific recommendation, and supplying it for people may increase the probability of people accepting the recommendation. We carry out several experiments to explore and analyze the effects of various factors of our method on recommendation results, and get the final method to compare with baselines.
Zhuoyu Wei, Jun Zhao, Kang Liu, Shizhu He

Graph-Based Jointly Modeling Entity Detection and Linking in Domain-Specific Area

The current state-of-the-art Entity Detection and Linking (EDL) systems are geared towards general corpora and cannot be directly applied to the specific domain effectively due to the fact that texts in domain-specific area are often noisy and contain phrases with ambiguous meanings that easily could be recognized as entity mention by traditional EDL methods but actually should not be linked to real entities (i.e., False Entity mention (FEM)). Moreover, in most current EDL literatures, ED (Entity Detection) and EL (Entity Linking) are frequently treated as equally important but separate problems and typically performed in a pipeline architecture without considering the mutual dependency between these two tasks. Therefore, to rigorously address the domain-specific EDL problem, we propose an iterative graph-based algorithm to jointly model the ED and EL tasks in domain-specific area by capturing the local dependency of mention-to-entity and the global interdependency of entity-to-entity. We extensively evaluated the performance of proposed algorithm over a data set of real world movie comments, and the experimental results show that the proposed approach significantly outperforms the baselines and achieve 82.7\(\%\) F1 score for ED and 89.0\(\%\) linking accuracy for EL respectively.
Jiangtao Zhang, Juanzi Li

LD2LD: Integrating, Enriching and Republishing Library Data as Linked Data

The development of digital library increases the need of integrating, enriching and republishing library data as Linked Data. Linked library data could provide high quality and more tailored service for library management agencies as well as for the public. However, even though there are many data sets containing metadata about publications and researchers, it is cumbersome to integrate and analyze them, since the collection is still a manual process and the sources are not connected to each other upfront. In this paper, we present an approach for integrating, enriching and republishing library data as Linked Data. In particular, we first adopt duplication detection and disambiguation techniques to reconcile researcher data, and then we connect researcher data with publication data such as papers, patents and monograph using entity linking methods. After that, we use simple reasoning to predict missing values and enrich the library data with external data. Finally, we republish the integrated and enriched library data as Linked Data.
Qingliang Miao, Ruiyu Fang, Lu Fang, Yao Meng, Chenying Li, Mingjie Han, Yong Zhao

Object Clustering in Linked Data Using Centrality

Large-scale linked data is becoming a challenge to many Semantic Web tasks. While clustering of graphs has been deeply researched in network science and machine learning, not many researches are carried on clustering in linked data. To identify meta-structures in large-scale linked data, the scalability of clustering should be considered. In this paper, we propose a scalable approach of centrality-based clustering, which works on a model of Object Graph derived from RDF graph. Centrality of objects is calculated as indicators for clustering. Both relational and linguistic closeness between objects are considered in clustering to produce coherent clusters.
Xiang Zhang, Yulian Lv, Erjing Lin

Research on Knowledge Fusion Connotation and Process Model

The emergence of big-data brings diversified structures and constant growths of knowledge. The objective of knowledge fusion (KF) research is to integrate, discover and exploit valuable knowledge from distributed, heterogeneous and autonomous knowledge sources, which is the necessary prerequisite and effective approach to implement knowledge services. In order to apply KF practice, this paper firstly discusses KF connotations in terms of analysing the relations and differences among various notions, i.e. knowledge fusion, knowledge integration, information fusion and data fusion. Then, based on the knowledge representation method using ontology, this paper investigates several KF implementation patterns and provides two types of dimensional KF process models oriented to demands of knowledge services.
Hao Fan, Fei Wang, Mao Zheng

E-SKB: A Semantic Knowledge Base for Emergency

Although the number of knowledge bases in Linked Open Data has grown explosively, there are few knowledge bases about emergency, an important issue in the area of social management. In this paper, we introduce a semantic knowledge base of emergency, extracted from an authoritative website. According to the characteristics of the website, a framework is suggested to convert web into RDF. In order to help researchers acquire more knowledge, we follow the publishing rules of Linked Open Data—not only using URIs to label the objects in the semantic knowledge base, but also providing links to DBpedia. Finally, we employ Sesame to store and publish the semantic knowledge base, and develop a query interface to retrieve the knowledge base with SPARQL.
Chang Wen, Yu Liu, Jinguang Gu, Jing Chen, Yingping Zhang

CCKS 2016 Shared Tasks


ICRC-DSEDL: A Film Named Entity Discovery and Linking System Based on Knowledge Bases

Named entity discovery and linking are hot topics in text mining, which is very important for text understanding as named entities that usually presented in various formats and some of them are ambiguous. To accelerate the development of related technology, the China Conference on Knowledge Graph and Semantic Computing (CCKS) in 2016 launches a competition, which includes a task on film named entity discovery and linking (i.e., task 1). We participate this competition and develop a system for task 1 of the CCKS competition. The system consists of two individual parts for named entity discovery (NED) and entity linking (EL) respectively. The first part is a hybrid subsystem based on conditional random field (CRF) and structural support vector machine (SSVM) with rich features, and the second part is a ranking subsystem where not only the given knowledge base but also open knowledge bases are used for candidate generation and SVMrank is used for candidate ranking. On the official test dataset of Task1 of CCKS 2016 competition, our system achieves an F1-score of 77.83% on NED, an accuracy of 86.53% on EL and an overall F1-score of 67.35%.
YaHui Zhao, Haodi Li, Qingcai Chen, Jianglu Hu, Guangpeng Zhang, Dong Huang, Buzhou Tang

Domain-Specific Entity Discovery and Linking Task

This paper describes the TEDL system for the entity discovery and linking, which compete the CCKS2016 domain-specific entity discovery and linking task. Given one review text and one pre-constructed movie knowledge base (MKB) from the douban website, we need to firstly detect all the entity mentions, then link them to MKB’s entities. The traditional named entity detection (NED) and entity linking (EL) techniques cannot be applied to domain-specific knowledge base effectively, most of existing techniques just take extracted named entities as the input to the following EL task without considering the interdependency between the NED and EL and how to detect the Fake Named Entities (FNEs) [1]. In this paper, we employ one novel method described in [1] to joint model the 2 procedures as our basic system. Besides it, we also used the basic system’s output as features to train models. Finally we ensemble all the models’ output to predict FNE. The experiment results show that 80.30% NED F1 score and 93.45% EL accuracy, which is better than that of traditional methods.
Tao Yang, Feng Zhang, Xiao Li, Qianghuai Jia, Ce Wang

Knowledge Base Completion via Rule-Enhanced Relational Learning

Traditional relational learning techniques perform the knowledge base (KB) completion task based solely on observed facts, ignoring rich domain knowledge that could be extremely useful for inference. In this paper, we encode domain knowledge as simple rules, and propose rule-enhanced relational learning for KB completion. The key idea is to use rules to further refine the inference results given by traditional relational learning techniques, and hence improve the inference accuracy of them. Facts inferred in this way will be the most preferred by relational learning, and at the same time comply with all the rules. Experimental results show that by incorporating the domain knowledge, our approach achieve the best overall performance in the CCKS 2016 competition.
Shu Guo, Boyang Ding, Quan Wang, Lihong Wang, Bin Wang

Knowledge Graph Embedding for Link Prediction and Triplet Classification

The link prediction (LP) and triplet classification (TC) are important tasks in the field of knowledge graph mining. However, the traditional link prediction methods of social networks cannot directly apply to knowledge graph data which contains multiple relations. In this paper, we apply the knowledge graph embedding method to solve the specific tasks with Chinese knowledge base The proposed method has been successfully used in the evaluation task of CCKS2016. Hopefully, it can achieve excellent performance.
E. Shijia, Shengbin Jia, Yang Xiang, Zilian Ji

Product Forecasting Based on Average Mutual Information and Knowledge Graph

The paper presents a method of modeling the training data which provided by China Conference on Knowledge Graph and Semantic Computing (CCKS) based on average mutual information and knowledge graph. Firstly, calculating the contribution of product attribute to the categories of product, and establishing the product prediction model of product. Then constructing the knowledge graph of training samples which is the network among attributes and categories of product; The average mutual information between attributes and categories is used to provide contribution value for the product prediction model, and the product knowledge graph limits the number of product categories effectively. This is an attempt to integrate algorithm of product forecasting with knowledge graph. After evaluating on the data released by CCKS2016, results show that classification model between average mutual and knowledge graph has high efficiency and accuracy.
Zili Zhou, Zhen Zou, Junyi Liu, Yun Zhang

Product Prediction with Deep Neural Networks

In this paper, we give a solution to the product prediction shared task of CCKS 2016. The main purpose of the task is to determine the product categories for the import and export transaction record data. For this specific dataset, we apply deep neural networks to solve the multi-label classification problem. On the training set, our proposed method achieves a precision of 0.90, and the proposed model can have a good performance on the test set.
E. Shijia, Yang Xiang


Additional information

Premium Partner

    Image Credits