Skip to main content

2016 | Buch

Big Data SMACK

A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka

insite
SUCHEN

Über dieses Buch

This book is about how to integrate full-stack open source big data architecture and how to choose the correct technology—Scala/Spark, Mesos, Akka, Cassandra, and Kafka—in every layer. Big data architecture is becoming a requirement for many different enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large datasets in a timely manner. In many cases now, organizations need more than one paradigm to perform efficient analyses.

Big Data SMACK explains each of the full-stack technologies and, more importantly, how to best integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation. The book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by every technology. It covers the six main concepts of big data architecture and how integrate, replace, and reinforce every layer:

The language: Scala

The engine: Spark (SQL, MLib, Streaming, GraphX)

The container: Mesos, Docker

The view: Akka

The storage: Cassandra

The message broker: Kafka

What you’ll learn

How to make big data architecture without using complex Greek letter architectures.

How to build a cheap but effective cluster infrastructure.

How to make queries, reports, and graphs that business demands.

How to manage and exploit unstructured and No-SQL data sources.

How use tools to monitor the performance of your architecture.

How to integrate all technologies and decide which replace and which reinforce.

Who This Book Is For

This book is for developers, data architects, and data scientists looking for how to integrate the most successful big data open stack architecture and how to choose the correct technology in every layer.

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter
Chapter 1. Big Data, Big Challenges
Abstract
In this chapter, we expose the modern architecture challenges facing the SMACK stack (Apache Spark, Mesos, Akka, Cassandra, and Kafka). Also, we present dynamic processing environment problems to see which conditions are suitable and which are not.
Raul Estrada, Isaac Ruiz
Chapter 2. Big Data, Big Solutions
Abstract
Many systems are monitoring a continuous stream of events: weather events, GPS signals, vital signs, logs, device metrics…. The list is endless. The natural way to collect and analyze this information is as a stream of data.
Raul Estrada, Isaac Ruiz

Playing SMACK

Frontmatter
Chapter 3. The Language: Scala
Abstract
The main part of the SMACK stack is Spark, but sometimes the S is for Scala. You can develop in Spark in four languages: Java, Scala, Python, and R. Because Apache Spark is written in Scala, and this book is focused on streaming architecture, we are going to show examples in only the Scala language.
Raul Estrada, Isaac Ruiz
Chapter 4. The Model: Akka
Abstract
If the previous chapter’s objective was to develop functional thinking, this chapter’s objective is to develop actor model thinking.
Raul Estrada, Isaac Ruiz
Chapter 5. Storage: Apache Cassandra
Abstract
Congratulations! You are almost halfway through this journey. You are at the point where it is necessary to meet the component responsible for information persistence; the sometimes neglected “data layer” will take on a new dimension when you have finished this chapter. It’s time to meet Apache Cassandra, a NoSQL database that provides high availability and scalability without compromising performance.
Raul Estrada, Isaac Ruiz
Chapter 6. The Engine: Apache Spark
Abstract
If our stack were a vehicle, now we have reached the engine. As an engine, we will disarm it, analyze it, master it, improve it, and run it to the limit.
Raul Estrada, Isaac Ruiz
Chapter 7. The Manager: Apache Mesos
Abstract
We are reaching the end of this trip. In this chapter, you will learn how to create your own cluster in a simple way.
Raul Estrada, Isaac Ruiz
Chapter 8. The Broker: Apache Kafka
Abstract
The goal of this chapter is to get you familiar with Apache Kafka and show you how to solve the consumption of millions of messages in a pipeline architecture. Here we show some Scala examples to give you a solid foundation for the different types of implementations and integrations for Kafka producers and consumers.
Raul Estrada, Isaac Ruiz

Improving SMACK

Frontmatter
Chapter 9. Fast Data Patterns
Abstract
In this chapter, we examine well-known patterns in developing fast data applications. As you know, there are two approaches: (1) the batch, on disk, traditional approach and (2) the streaming, on memory, modern approach. The patterns in this chapter apply to both approaches.
Raul Estrada, Isaac Ruiz
Chapter 10. Data Pipelines
Abstract
Well, we have reached the chapter where we have to connect everything, especially theory and practice. This chapter has two parts: the first part is an enumeration of the data pipeline strategies and the second part is how to connect the technologies.
Raul Estrada, Isaac Ruiz
Chapter 11. Glossary
Abstract
This glossary of terms and concepts aids in understanding the SMACK stack.
Raul Estrada, Isaac Ruiz
Backmatter
Metadaten
Titel
Big Data SMACK
verfasst von
Raul Estrada
Isaac Ruiz
Copyright-Jahr
2016
Verlag
Apress
Electronic ISBN
978-1-4842-2175-4
Print ISBN
978-1-4842-2174-7
DOI
https://doi.org/10.1007/978-1-4842-2175-4