Knowledge Discovery from Multi-Sourced Data

Authors: Dr. Chen Ye, Prof. Hongzhi Wang, Prof. Guojun Dai

Publisher: Springer Nature Singapore

Book Series : SpringerBriefs in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book addresses several knowledge discovery problems on multi-sourced data where the theories, techniques, and methods in data cleaning, data mining, and natural language processing are synthetically used. This book mainly focuses on three data models: the multi-sourced isomorphic data, the multi-sourced heterogeneous data, and the text data. On the basis of three data models, this book studies the knowledge discovery problems including truth discovery and fact discovery on multi-sourced data from four important properties: relevance, inconsistency, sparseness, and heterogeneity, which is useful for specialists as well as graduate students. Data, even describing the same object or event, can come from a variety of sources such as crowd workers and social media users. However, noisy pieces of data or information are unavoidable. Facing the daunting scale of data, it is unrealistic to expect humans to “label” or tell which data source is more reliable. Hence, it is crucial to identify trustworthy information from multiple noisy information sources, referring to the task of knowledge discovery. At present, the knowledge discovery research for multi-sourced data mainly faces two challenges. On the structural level, it is essential to consider the different characteristics of data composition and application scenarios and define the knowledge discovery problem on different occasions. On the algorithm level, the knowledge discovery task needs to consider different levels of information conflicts and design efficient algorithms to mine more valuable information using multiple clues. Existing knowledge discovery methods have defects on both the structural level and the algorithm level, making the knowledge discovery problem far from totally solved.

Frontmatter

Chapter 1. Introduction

Abstract

In the age of information explosion, data has penetrated every aspect of our lives. Different data sources, such as social networks, sensing devices, and crowdsourcing platforms, constantly generate data. Even for the same object, various data sources provide its information. Intuitively, analyzing these multi-source data yields valuable information. On the personal level, enterprises can recommend targeted products by analyzing the comments of their target customers on multiple platforms. On the group level, by analyzing the characteristics of massive amounts of multi-source data, government departments can make reasonable political decisions, and researchers can achieve novel findings. Based on the above observations, the intelligent decision-making model with multi-source data as the core gradually replaces the traditional artificial decision-making mode. This chapter discusses the background of knowledge discovery from multi-source data. In Sect. 1.1, we analyze the multi-source data quality to motivate the necessity of discovering useful information from noisy sources. In Sect. 1.2, we summarize the existing studies and explore the drawbacks. We conclude the chapter with an overview of the structure of this book in Sect. 1.3.