Big Data Preprocessing

Enabling Smart Data

verfasst von: Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego, Prof. Salvador García, Prof. Francisco Herrera

Verlag: Springer International Publishing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book offers a comprehensible overview of Big Data Preprocessing, which includes a formal description of each problem. It also focuses on the most relevant proposed solutions. This book illustrates actual implementations of algorithms that helps the reader deal with these problems.
This book stresses the gap that exists between big, raw data and the requirements of quality data that businesses are demanding. This is called Smart Data, and to achieve Smart Data the preprocessing is a key step, where the imperfections, integration tasks and other processes are carried out to eliminate superfluous information. The authors present the concept of Smart Data through data preprocessing in Big Data scenarios and connect it with the emerging paradigms of IoT and edge computing, where the end points generate Smart Data without completely relying on the cloud.
Finally, this book provides some novel areas of study that are gathering a deeper attention on the Big Data preprocessing. Specifically, it considers the relation with Deep Learning (as of a technique that also relies in large volumes of data), the difficulty of finding the appropriate selection and concatenation of preprocessing techniques applied and some other open problems.
Practitioners and data scientists who work in this field, and want to introduce themselves to preprocessing in large data volume scenarios will want to purchase this book. Researchers that work in this field, who want to know which algorithms are currently implemented to help their investigations, may also be interested in this book.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract

We live in a world where data is generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapsed time, and to extract valuable knowledge from it. The term “Big Data” has spread rapidly in the framework of data mining and business intelligence. This new scenario can be defined by means of those problems that cannot be effectively or efficiently addressed using the standard computing resources that we currently have. We must emphasize that Big Data does not just imply large volumes of data but also the necessity for scalability, i.e., to ensure a response in an acceptable elapsed time. Therefore, the use of Big Data Analytics tools provides very significant advantages to both industry and academia. In this chapter we provide an introduction to Big Data and its problems. Next we discuss about a new topic, namely Big Data Analytics, referred to the application of machine learning techniques to Big Data problems. Then we continue with a definition of data preprocessing and the different techniques used to improve the quality of data. We finish with an introduction to the state of Big Data streaming.