Skip to main content
main-content

Über dieses Buch

This book is about how to integrate full-stack open source big data architecture and how to choose the correct technology—Scala/Spark, Mesos, Akka, Cassandra, and Kafka—in every layer. Big data architecture is becoming a requirement for many different enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large datasets in a timely manner. In many cases now, organizations need more than one paradigm to perform efficient analyses.

Big Data SMACK explains each of the full-stack technologies and, more importantly, how to best integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation. The book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by every technology. It covers the six main concepts of big data architecture and how integrate, replace, and reinforce every layer:

The language: Scala

The engine: Spark (SQL, MLib, Streaming, GraphX)

The container: Mesos, Docker

The view: Akka

The storage: Cassandra

The message broker: Kafka

What you’ll learn

How to make big data architecture without using complex Greek letter architectures.

How to build a cheap but effective cluster infrastructure.

How to make queries, reports, and graphs that business demands.

How to manage and exploit unstructured and No-SQL data sources.

How use tools to monitor the performance of your architecture.

How to integrate all technologies and decide which replace and which reinforce.

Who This Book Is For

This book is for developers, data architects, and data scientists looking for how to integrate the most successful big data open stack architecture and how to choose the correct technology in every layer.

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter

Chapter 1. Big Data, Big Challenges

Abstract
In this chapter, we expose the modern architecture challenges facing the SMACK stack (Apache Spark, Mesos, Akka, Cassandra, and Kafka). Also, we present dynamic processing environment problems to see which conditions are suitable and which are not.
Raul Estrada, Isaac Ruiz

Chapter 2. Big Data, Big Solutions

Abstract
Many systems are monitoring a continuous stream of events: weather events, GPS signals, vital signs, logs, device metrics…. The list is endless. The natural way to collect and analyze this information is as a stream of data.
Raul Estrada, Isaac Ruiz

Playing SMACK

Frontmatter

Chapter 3. The Language: Scala

Abstract
The main part of the SMACK stack is Spark, but sometimes the S is for Scala. You can develop in Spark in four languages: Java, Scala, Python, and R. Because Apache Spark is written in Scala, and this book is focused on streaming architecture, we are going to show examples in only the Scala language.
Raul Estrada, Isaac Ruiz

Chapter 4. The Model: Akka

Abstract
If the previous chapter’s objective was to develop functional thinking, this chapter’s objective is to develop actor model thinking.
Raul Estrada, Isaac Ruiz

Chapter 5. Storage: Apache Cassandra

Abstract
Congratulations! You are almost halfway through this journey. You are at the point where it is necessary to meet the component responsible for information persistence; the sometimes neglected “data layer” will take on a new dimension when you have finished this chapter. It’s time to meet Apache Cassandra, a NoSQL database that provides high availability and scalability without compromising performance.
Raul Estrada, Isaac Ruiz

Chapter 6. The Engine: Apache Spark

Abstract
If our stack were a vehicle, now we have reached the engine. As an engine, we will disarm it, analyze it, master it, improve it, and run it to the limit.
Raul Estrada, Isaac Ruiz

Chapter 7. The Manager: Apache Mesos

Abstract
We are reaching the end of this trip. In this chapter, you will learn how to create your own cluster in a simple way.
Raul Estrada, Isaac Ruiz

Chapter 8. The Broker: Apache Kafka

Abstract
The goal of this chapter is to get you familiar with Apache Kafka and show you how to solve the consumption of millions of messages in a pipeline architecture. Here we show some Scala examples to give you a solid foundation for the different types of implementations and integrations for Kafka producers and consumers.
Raul Estrada, Isaac Ruiz

Improving SMACK

Frontmatter

Chapter 9. Fast Data Patterns

Abstract
In this chapter, we examine well-known patterns in developing fast data applications. As you know, there are two approaches: (1) the batch, on disk, traditional approach and (2) the streaming, on memory, modern approach. The patterns in this chapter apply to both approaches.
Raul Estrada, Isaac Ruiz

Chapter 10. Data Pipelines

Abstract
Well, we have reached the chapter where we have to connect everything, especially theory and practice. This chapter has two parts: the first part is an enumeration of the data pipeline strategies and the second part is how to connect the technologies.
Raul Estrada, Isaac Ruiz

Chapter 11. Glossary

Abstract
This glossary of terms and concepts aids in understanding the SMACK stack.
Raul Estrada, Isaac Ruiz

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise