Skip to main content
main-content

Über dieses Buch

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space.

This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds.

Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop.

You will learn how to:

Handle a petabyte data store by applying familiar SQL techniques

Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase

Apply best practices while working with a scalable data store on Hadoop and HBase

Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis

Demonstrate real-time use cases and big data modeling techniques

Who This Book Is For

Data engineers, Big Data administrators, and architects.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

From the inception of mainframes to modern cloud storage and mobile devices, the amount of data produced has risen steeply. Today, humans produce large amounts of data as they go about their day-to-day activities and business operations. For decades, much of the data produced was not used for analysis or business decision purposes. Nevertheless, data has always been indispensable for both small and large enterprises. Nowadays due to digitalization, the importance and value of data has become an integral part of business decisions. Take the example of online retailers who base business predictions on the basis of user clicks and purchasing patterns—actions that generated a huge amount of data. By applying analytical tools to this data, the retailer gleans valuable information for decision making. One can imagine the flood of data pouring from a smart house or a smart city.

Shakil Akhtar, Ravi Magham

Chapter 2. Using Phoenix

Apache Phoenix is a coating of traditional SQL-like syntactic sugar applied to Hadoop’s HBase NoSQL database. It was created as an internal project at Salesforce, later open-sourced on GitHub, and became a top-level Apache project in a very short period of time. HBase, the Hadoop database, is a highly-scalable NoSQL database. You can query HBase data using Phoenix with a syntax similar to SQL as used for relational databases. Apache Phoenix provides a JDBC driver and works as an SQL driver to HBase. Phoenix queries are optimized primarily for HBase and use many Hbase-related techniques, such as skip scan, to improve performance. We will cover skip scan and other advanced Phoenix topics in further chapters.

Shakil Akhtar, Ravi Magham

Chapter 3. CRUD with Phoenix

Now that we have installed Phoenix and HBase, let’s get started with performing the basic operations of CREATE, UPDATE, DELETE and SELECT using SQL. Let’s also take a dive into the data types and perform CRUD operations from the “Sqlline” CLI available in Phoenix.

Shakil Akhtar, Ravi Magham

Chapter 4. Querying Data

In the previous chapter we discussed basic Phoenix commands for CRUD operations. In this chapter we will be digging deep into working with tables (creating, altering, and dropping tables), Phoenix available clauses (LIMIT, WHERE, GROUP BY, HAVING, and ORDER BY), data constraints (NOT NULL) and conditional operators (AND, OR, IN, LIKE, and BETWEEN) for data retrieval.

Shakil Akhtar, Ravi Magham

Chapter 5. Advanced Querying

In general querying, we saw how to work with single table. Now let’s explore how to work with multiple tables. When we want to retrieve data from more than one tables from database, joins are used to collect required columns data in a single query. Joins are heavy and slower than plain queries but phoenix supports many configurations and hints to fine tune your join query performance for faster results. We will discuss them in this chapter while explaining join optimizations section.

Shakil Akhtar, Ravi Magham

Chapter 6. Transactions

When we discuss databases, either relational or non-relational, transactions are one of the most important considerations for ensuring data integrity or dealing with concurrent tasks. Transactions also play an important role when handling database errors and avoiding inconsistent states. Transactions are an integral part of relational databases. Although NoSQL databases generally do not support transactions, there are some that enable transactions with the help of a transaction manager. Similarly, HBase provides transaction support by using Apache Tephra as its transaction manager. In this chapter we will see how Phoenix supports transactions.

Shakil Akhtar, Ravi Magham

Chapter 7. Advanced Phoenix Concepts

Data retrieval and search performance help organizations in meeting customer expectations and gaining the business benefits over the competition. Indexes improve data access time performance for NoSQL as well as for relational databases. Phoenix supports indexing, which we will discuss in this chapter. Along with indexing, we will see how to work with Phoenix user defined functions (UDF), writing custom UDFs and the Phoenix query server.

Shakil Akhtar, Ravi Magham

Chapter 8. Integrating Phoenix with Other Frameworks

In previous chapters we discussed Phoenix fundamental constructs, querying using Phoenix and other advanced concepts. We can also use Phoenix with other existing technologies in the Hadoop ecosystem. This chapter focuses on Phoenix integration with Spark, Pig, Hive, and MapReduce frameworks. Phoenix is a powerful yet easy to use framework for integrating with Spark for real time data analysis and massively parallel MapReduce jobs. It can also act as a catalyst for Hive and Pig-like scripting to achieve better performance in big data analytics space. We will discuss all these integration points available in Phoenix and how to use them effectively for massive data sets.

Shakil Akhtar, Ravi Magham

Chapter 9. Tools & Tuning

We have seen how Phoenix can help us for big data analysis by providing an interface for writing simple, easy-to-use queries and its other features available for handling HBase data in an efficient way. The important thing for any database query engine is its performance, especially as it supports increasing loads. Phoenix provides many configurations and suggests many ways by which we can meet our performance SLAs. This chapter is all about performance and the available Phoenix tools that provide insight into what is inside Phoenix, so we can tune it well while handling any issues in our production environment.

Shakil Akhtar, Ravi Magham

Backmatter

Weitere Informationen

Premium Partner

Neuer Inhalt

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!

Bildnachweise