Skip to main content

2018 | Buch

Models of Computation for Big Data

insite
SUCHEN

Über dieses Buch

The big data tsunami changes the perspective of industrial and academic research in how they address both foundational questions and practical applications. This calls for a paradigm shift in algorithms and the underlying mathematical techniques. There is a need to understand foundational strengths and address the state of the art challenges in big data that could lead to practical impact. The main goal of this book is to introduce algorithmic techniques for dealing with big data sets. Traditional algorithms work successfully when the input data fits well within memory. In many recent application situations, however, the size of the input data is too large to fit within memory.

Models of Computation for Big Data, covers mathematical models for developing such algorithms, which has its roots in the study of big data that occur often in various applications. Most techniques discussed come from research in the last decade. The book will be structured as a sequence of algorithmic ideas, theoretical underpinning, and practical use of that algorithmic idea. Intended for both graduate students and advanced undergraduate students, there are no formal prerequisites, but the reader should be familiar with the fundamentals of algorithm design and analysis, discrete mathematics, probability and have general mathematical maturity.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Streaming Models
Abstract
In the analysis of big data there are queries that do not scale since they need massive computing resources and time to generate exact results. For example, count distinct, most frequent items, joins, matrix computations, and graph analysis. If approximate results are acceptable, there is a class of dedicated algorithms, known as streaming algorithms or sketches that can produce results orders-of magnitude faster and with precisely proven error bounds. For interactive queries there may not be supplementary practical options, and in the case of real-time analysis, sketches are the only recognized solution.
Rajendra Akerkar
Chapter 2. Sub-linear Time Models
Abstract
Sub-linear time algorithms represent a new paradigm in computing, where an algorithm must give some sort of an answer after inspecting only a very small portion of the input. It has its roots in the study of massive data sets that occur more and more frequently in various applications. Financial transactions with billions of input data and Internet traffic analyses are examples of modern data sets that show unprecedented scale. Managing and analysing such data sets forces us to reconsider the old idea of efficient algorithms. Processing such massive data sets in more than linear time is by far too expensive and frequently linear time algorithms may be extremely slow. Thus, there is the need to construct algorithms whose running times are not only polynomial, but rather are sub-linear in n.
Rajendra Akerkar
Chapter 3. Linear Algebraic Models
Abstract
This chapter presents some of the fundamental linear algebraic tools for large scale data analysis and machine learning. Specifically, the focus will fall on large scale linear algebra, including iterative, approximate and randomized algorithms for basic linear algebra computations and matrix functions. Algorithms of such type are mostly pass-efficient, requiring only a constant number of passes over the matrix data for creating samples or sketches, and other work. Most these algorithms require at least two passes for their efficient performance guarantees, with respect to error or failure probability. Such a one-pass algorithm is close to the streaming model of computation, where there is one pass over the data, and resource bounds are sublinear in the data size.
Rajendra Akerkar
Chapter 4. Assorted Computational Models
Abstract
This chapter presents some other computational models to tackle massive datasets efficiently. We will see formalized models for some massive data settings, and explore core algorithmic ideas arising in them. The models discussed are cell probe, online bipartite matching, MapReduce programming model, Markov chain, and crowdsourcing. Finally, we present some basic aspects of communication complexity.
Rajendra Akerkar
Backmatter
Metadaten
Titel
Models of Computation for Big Data
verfasst von
Rajendra Akerkar
Copyright-Jahr
2018
Electronic ISBN
978-3-319-91851-8
Print ISBN
978-3-319-91850-1
DOI
https://doi.org/10.1007/978-3-319-91851-8