Models of Computation for Big Data

verfasst von: Rajendra Akerkar

Verlag: Springer International Publishing

Buchreihe : Advanced Information and Knowledge Processing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The big data tsunami changes the perspective of industrial and academic research in how they address both foundational questions and practical applications. This calls for a paradigm shift in algorithms and the underlying mathematical techniques. There is a need to understand foundational strengths and address the state of the art challenges in big data that could lead to practical impact. The main goal of this book is to introduce algorithmic techniques for dealing with big data sets. Traditional algorithms work successfully when the input data fits well within memory. In many recent application situations, however, the size of the input data is too large to fit within memory.

Models of Computation for Big Data, covers mathematical models for developing such algorithms, which has its roots in the study of big data that occur often in various applications. Most techniques discussed come from research in the last decade. The book will be structured as a sequence of algorithmic ideas, theoretical underpinning, and practical use of that algorithmic idea. Intended for both graduate students and advanced undergraduate students, there are no formal prerequisites, but the reader should be familiar with the fundamentals of algorithm design and analysis, discrete mathematics, probability and have general mathematical maturity.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Streaming Models

Abstract

In the analysis of big data there are queries that do not scale since they need massive computing resources and time to generate exact results. For example, count distinct, most frequent items, joins, matrix computations, and graph analysis. If approximate results are acceptable, there is a class of dedicated algorithms, known as streaming algorithms or sketches that can produce results orders-of magnitude faster and with precisely proven error bounds. For interactive queries there may not be supplementary practical options, and in the case of real-time analysis, sketches are the only recognized solution.

Rajendra Akerkar

Chapter 2. Sub-linear Time Models

Abstract

Sub-linear time algorithms represent a new paradigm in computing, where an algorithm must give some sort of an answer after inspecting only a very small portion of the input. It has its roots in the study of massive data sets that occur more and more frequently in various applications. Financial transactions with billions of input data and Internet traffic analyses are examples of modern data sets that show unprecedented scale. Managing and analysing such data sets forces us to reconsider the old idea of efficient algorithms. Processing such massive data sets in more than linear time is by far too expensive and frequently linear time algorithms may be extremely slow. Thus, there is the need to construct algorithms whose running times are not only polynomial, but rather are sub-linear in n.

Rajendra Akerkar

Chapter 3. Linear Algebraic Models

Abstract

This chapter presents some of the fundamental linear algebraic tools for large scale data analysis and machine learning. Specifically, the focus will fall on large scale linear algebra, including iterative, approximate and randomized algorithms for basic linear algebra computations and matrix functions. Algorithms of such type are mostly pass-efficient, requiring only a constant number of passes over the matrix data for creating samples or sketches, and other work. Most these algorithms require at least two passes for their efficient performance guarantees, with respect to error or failure probability. Such a one-pass algorithm is close to the streaming model of computation, where there is one pass over the data, and resource bounds are sublinear in the data size.

Rajendra Akerkar

Chapter 4. Assorted Computational Models

Abstract

This chapter presents some other computational models to tackle massive datasets efficiently. We will see formalized models for some massive data settings, and explore core algorithmic ideas arising in them. The models discussed are cell probe, online bipartite matching, MapReduce programming model, Markov chain, and crowdsourcing. Finally, we present some basic aspects of communication complexity.

Rajendra Akerkar

Backmatter

Titel: Models of Computation for Big Data
verfasst von: Rajendra Akerkar
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-91851-8
Print ISBN: 978-3-319-91850-1
DOI: https://doi.org/10.1007/978-3-319-91851-8