Skip to main content

2016 | Buch

Big Data Concepts, Theories, and Applications

insite
SUCHEN

Über dieses Buch

This book covers three major parts of Big Data: concepts, theories and applications. Written by world-renowned leaders in Big Data, this book explores the problems, possible solutions and directions for Big Data in research and practice. It also focuses on high level concepts such as definitions of Big Data from different angles; surveys in research and applications; and existing tools, mechanisms, and systems in practice. Each chapter is independent from the other chapters, allowing users to read any chapter directly.

After examining the practical side of Big Data, this book presents theoretical perspectives. The theoretical research ranges from Big Data representation, modeling and topology to distribution and dimension reducing. Chapters also investigate the many disciplines that involve Big Data, such as statistics, data mining, machine learning, networking, algorithms, security and differential geometry. The last section of this book introduces Big Data applications from different communities, such as business, engineering and science.

Big Data Concepts, Theories and Applications is designed as a reference for researchers and advanced level students in computer science, electrical engineering and mathematics. Practitioners who focus on information systems, big data, data mining, business analysis and other related fields will also find this material valuable.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Big Continuous Data: Dealing with Velocity by Composing Event Streams
Abstract
The rate at which we produce data is growing steadily, thus creating even larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness and relatedness to on-going events. Modern applications consuming these streams need to extract behaviour patterns that can be obtained by aggregating and mining statically and dynamically huge event histories. An event is the notification that a happening of interest has occurred. Event streams must be combined or aggregated to produce more meaningful information. By combining and aggregating them either from multiple producers, or from a single one during a given period of time, a limited set of events describing meaningful situations may be notified to consumers. Event streams with their volume and continuous production cope mainly with two of the characteristics given to Big Data by the 5V’s model: volume & velocity. Techniques such as complex pattern detection, event correlation, event aggregation, event mining and stream processing, have been used for composing events. Nevertheless, to the best of our knowledge, few approaches integrate different composition techniques (online and post-mortem) for dealing with Big Data velocity. This chapter gives an analytical overview of event stream processing and composition approaches: complex event languages, services and event querying systems on distributed logs. Our analysis underlines the challenges introduced by Big Data velocity and volume and use them as reference for identifying the scope and limitations of results stemming from different disciplines: networks, distributed systems, stream databases, event composition services, and data mining on traces.
Genoveva Vargas-Solar, Javier A. Espinosa-Oviedo, José Luis Zechinelli-Martini
Chapter 2. Big Data Tools and Platforms
Abstract
The fast evolving Big Data Tools and Platforms space has given rise to various technologies to deal with different Big Data use cases. However, because of the multitude of the tools and platforms involved it is often difficult for the Big Data practitioners to understand and select the right tools for addressing a given business problem related to Big Data. In this chapter we cover an introductory discussion to the various Big Data Tools and Platforms with the aim of providing necessary breadth and depth to the Big Data practitioner so that they can have a reasonable background to start with to support the Big Data initiatives in their organizations. We start with the discussion of common Technical Concepts and Patterns typically used by the core Big Data Tools and Platforms. Then we delve into the individual characteristics of different categories of the Big Data Tools and Platforms in detail. Then we also cover the applicability of the various categories of Big Data Tools and Platforms to various enterprise level Big Data use cases. Finally, we discuss the future works happening in this space to cover the newer patterns, tools and platforms to be watched for implementation of Big Data use cases.
Sourav Mazumder
Chapter 3. Traffic Identification in Big Internet Data
Abstract
The era of big data brings new challenges to the network traffic technique that is an essential tool for network management and security. To deal with the problems of dynamic ports and encrypted payload in traditional port-based and payload-based methods, the state-of-the-art method employs flow statistical features and machine learning techniques to identify network traffic. This chapter reviews the statistical-feature based traffic classification methods, that have been proposed in the last decade. We also examine a new problem: unclean traffic in the training stage of machine learning due to the labeling mistake and complex composition of big Internet data. This chapter further evaluates the performance of typical machine learning algorithms with unclean training data. The review and the empirical study can provide a guide for academia and practitioners in choosing proper traffic classification methods in real-world scenarios.
Binfeng Wang, Jun Zhang, Zili Zhang, Wei Luo, Dawen Xia
Chapter 4. Security Theories and Practices for Big Data
Abstract
Big data applications usually require flexible and scalable infrastructure for efficient processing. Cloud computing satisfies these requirements very well and has been widely adopted to provide big data services. However, outsourcing and resource sharing features of cloud computing lead to security concerns when applied to big data applications, e.g., confidentiality of data/program, and integrity of the processing procedure. On the other hand, when cloud owns the data and provides analytic service, data privacy also becomes a challenge. Security concerns and pressing demand for adopting big data technology together motivate the development of a special class of security technologies for safe big data processing in cloud environment. These approaches are roughly divided into two categories: designing new algorithms with unique security features and developing security enhanced systems to protect big data applications. In this chapter, we review the approaches for secure big data processing from both categories, evaluate and compare these technologies from different perspectives, and present a general outlook on the current state of research and development in the field of security theories for big data.
Lei Xu, Weidong Shi
Chapter 5. Rapid Screening of Big Data Against Inadvertent Leaks
Abstract
Keeping sensitive data from unauthorized parties in the highly connected world is challenging. Statistics from security firms, research institutions, and government organizations show that the number of data-leak instances has grown rapidly in the last years. Deliberately planned attacks, inadvertent leaks, and human mistakes constitute the majority of the incidents. In this chapter, we first introduce the threat of data leak and overview traditional solutions in detecting and preventing sensitive data from leaking. Then we point out new challenges in the era of big data and present the state-of-the-art data-leak detection designs and algorithms. These solutions leverage big data theories and platforms—data mining, MapReduce, GPGPU, etc.—to harden the privacy control for big data. We also discuss the open research problems in data-leak detection and prevention.
Xiaokui Shu, Fang Liu, Danfeng (Daphne) Yao
Chapter 6. Big Data Storage Security
Abstract
The demand for data storage and processing is increasing at a rapid speed in the big data era. The management of such tremendous volume of data is a critical challenge to the data storage systems. Firstly, since 60 % of the stored data is claimed to be redundant, data deduplication technology becomes an attractive solution to save storage space and traffic in a big data environment. Secondly, the security issues, such as confidentiality, integrity and privacy of the big data should also be considered for big data storage. To address these problems, convergent encryption is widely used to secure data deduplication for big data storage. Nonetheless, there still exist some other security issues, such as proof of ownership, key management and so on. In this chapter, we first introduce some major cyber attacks for big data storage. Then, we describe the existing fundamental security techniques, whose integration is essential for preventing data from existing and future security attacks. By discussing some interesting open problems, we finally expect to trigger more research efforts in this new research field.
Mi Wen, Shui Yu, Jinguo Li, Hongwei Li, Kejie Lu
Chapter 7. Cyber Attacks on MapReduce Computation Time in a Hadoop Cluster
Abstract
In this chapter, we addressed the security issue in a Hadoop cluster when some nodes are compromised. We investigated the impact of attacks on the completion time of a MapReduce job when a node is compromised in a Hadoop cluster. We studied three attack methods: (1) blocking all incoming data from the master node except for the special messages that relay the status of the slave node, (2) delaying the delivery of packets that are sent to the master node, and performing an attack such as denial-of-service attack against the master node. To understand the impact of these attacks, we implemented them on different cluster settings that consist of three, six, and nine slave nodes and a single mater node in our testbed. Our data shows these attacks can affect the performance of MapReduce by increasing the computing time of MapReduce jobs.
William Glenn, Wei Yu
Chapter 8. Security and Privacy for Big Data
Abstract
Security and privacy is one of the critical issues for big data and has drawn great attention of both industry and research community. Following this major trend, in this chapter we provide an overview of state-of-the-art research issues and achievements in the field of security and privacy of big data, by highlighting recent advances in data encryption, privacy preservation and trust management. In section of data encryption, searchable encryption, order-preserving encryption, structured encryption and homomorphic encryption are respectively analyzed. In section of privacy preservation, three representative mechanisms including access control, auditing and statistical privacy, are reviewed. In section of trust management, several approaches especially trusted computing based approaches and trust and reputation models are investigated. Besides, current security measures for big data platforms, particularly for Apache Hadoop, are also discussed. The approaches presented in the chapter selected for this survey represent only a small fraction of the wide research effort within security and privacy of big data. Nevertheless, they serve as an indication of the diversity of challenges that are being addressed.
Shuyu Li, Jerry Gao
Chapter 9. Big Data Applications in Engineering and Science
Abstract
Research to solve engineering and science problems commonly require the collection and complex analysis of a vast amount of data. This makes them a natural exemplar of big data applications. For example, data from weather stations, high resolution images from CT scans, or data captured by astronomical instruments all easily showcase one or more big data characteristics, i.e., volume, velocity, variety and veracity. These big data characteristics present computational and analytical challenges that need to be overcame in order to deliver engineering solutions or make scientific discoveries. In this chapter, we catalogued engineering and science problems that carry a big data angle. We will also discuss the research advances for these problems and present a list of tools available to the practitioner. A number of big data application exemplars from the past works of the authors are discussed with further depth, highlighting the association of the specific problem and its big data characteristics. The overview from these various perspectives will provide the reader an up-to-date audit of big data developments in engineering and science.
Kok-Leong Ong, Daswin De Silva, Yee Ling Boo, Ee Hui Lim, Frank Bodi, Damminda Alahakoon, Simone Leao
Chapter 10. Geospatial Big Data for Environmental and Agricultural Applications
Abstract
Earth observation (EO) and environmental geospatial datasets are growing at an unprecedented rate in size, variety and complexity, thus, creating new challenges and opportunities as far as their access, archiving, processing and analytics are concerned. Currently, huge imaging streams are reaching several petabytes in many satellite archives worldwide. In this chapter, we review the current state-of-the-art in big data frameworks able to access, handle, process, analyse and deliver geospatial data and value-added products. Operational services that feature efficient implementations and different architectures allowing in certain cases the online and near real-time processing and analytics are detailed. Based on the current status, state-of-the-art and emerging challenges, the present study highlights certain issues, insights and future directions towards the efficient exploitation of EO big data for important engineering, environmental and agricultural applications.
Athanasios Karmas, Angelos Tzotsos, Konstantinos Karantzalos
Chapter 11. Big Data in Finance
Abstract
Quantitative finance is an area in which data is the vital actionable information in all aspects. Leading finance institutions and firms are adopting advanced Big Data technologies towards gaining actionable insights from massive market data, standardizing financial data from a variety of sources, reducing the response time to real-time data streams, improving the scalability of algorithms and software stacks on novel architectures. Today, these major profits are driving the pioneers of the financial practitioners to develop and deploy the big data solutions in financial products, ranging from front-office algorithmic trading to back-office data management and analytics.
Not only the collection and purification of multi-source data, the effective visualization of high-throughput data streams and rapid programmability on massively parallel processing architectures are widely used to facilitate the algorithmic trading and research. Big data analytics can help reveal more hidden market opportunities through analyzing high-volume structured data and social news, in contrast to the underperformers that are incapable of adopting novel techniques. Being able to process massive complex events in ultra-fast speed removes the roadblock for promptly capturing market trends and timely managing risks.
These key trends in capital markets and extensive examples in quantitative finance are systematically highlighted in this chapter. The insufficiency of technological adaptation and the gap between research and practice are also presented.
To clarify matters, the three natures of Big Data, volume, velocity and variety are used as a prism through which to understand the pitfalls and opportunities of emerged and emerging technologies towards financial services.
Bin Fang, Peng Zhang
Chapter 12. Big Data Applications in Business Analysis
Abstract
How can service providers turn their big data into actionable knowledge that drives profitable business results? Using the real-world case of China Southern Airlines, this chapter illustrates how big data analytics can help airline companies to develop a comprehensive 360-degree view of the passengers. This chapter introduces a number of data mining techniques, including Weibo customer value modeling, social network analysis, website click-stream analysis, customer activity analysis, clustering analysis, Recency-Frequency-Monetary (RFM) analysis, and principle component analysis. Using the sample dataset provided by the airline company, this chapter demonstrates how to apply big data techniques to explore passengers’ travel pattern and social network, predict how many times the passengers will travel in the future, and segment customer groups based on customer lifetime value. In addition, this chapter introduces a multi-channel intelligence customer marketing platform for airlines. The findings of this study provide airline companies useful insights to better understand the passenger behavior and develop effective strategies for customer relationship management.
Sien Chen, Yinghua Huang, Wenqiang Huang
Metadaten
Titel
Big Data Concepts, Theories, and Applications
herausgegeben von
Shui Yu
Song Guo
Copyright-Jahr
2016
Electronic ISBN
978-3-319-27763-9
Print ISBN
978-3-319-27761-5
DOI
https://doi.org/10.1007/978-3-319-27763-9

Premium Partner