Skip to main content

2018 | Buch

Practical Data Science

A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets

insite
SUCHEN

Über dieses Buch

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets.
The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions.

What You'll LearnBecome fluent in the essential concepts and terminology of data science and data engineering
Build and use a technology stack that meets industry criteria
Master the methods for retrieving actionable business knowledge
Coordinate the handling of polyglot data types in a data lake for repeatable resultsWho This Book Is For

Data scientists and data engineers who are required to convert data from a data lake into actionable knowledge for their business, and students who aspire to be data scientists and data engineers

Inhaltsverzeichnis

Frontmatter
Chapter 1. Data Science Technology Stack
Abstract
The Data Science Technology Stack covers the data processing requirements in the Rapid Information Factory ecosystem. Throughout the book, I will discuss the stack as the guiding pattern.
Andreas François Vermeulen
Chapter 2. Vermeulen-Krennwallner-Hillman-Clark
Abstract
Let’s begin by constructing a customer. I have created a fictional company for which you will perform the practical data science as your progress through this book. You can execute your examples in either a Windows or Linux environment. You only have to download the desired example set.
Andreas François Vermeulen
Chapter 3. Layered Framework
Abstract
In this chapter, I will introduce you to new concepts that enable us to share insights on a common understanding and terminology. I will define the Data Science Framework in detail, while introducing the Homogeneous Ontology for Recursive Uniform Schema (HORUS). I will take you on a high-level tour of the top layers of the framework, by explaining the fundamentals of the business layer, utility layer, operational management layer, plus audit, balance, and control layers.
Andreas François Vermeulen
Chapter 4. Business Layer
Abstract
In this chapter, I define the business layer in detail, clarifying why, where, and what functional and nonfunctional requirements are presented in the data science solution. With the aid of examples, I will help you to engineer a practical business layer and advise you, as I explain the layer in detail and discuss methods to assist you in performing good data science.
Andreas François Vermeulen
Chapter 5. Utility Layer
Abstract
The utility layer is used to store repeatable practical methods of data science. The objective of this chapter is to define how the utility layer is used in the ecosystem. Utilities are the common and verified workhorses of the data science ecosystem.
Andreas François Vermeulen
Chapter 6. Three Management Layers
Abstract
This chapter is about the three management layers that are must-haves for any large-scale data science system. I will discuss them at a basic level. I suggest you scale-out these management capabilities, as your environment grows.
Andreas François Vermeulen
Chapter 7. Retrieve Superstep
Abstract
In this chapter, I define the Retrieve superstep as a practical method for importing completely into the processing ecosystem a data lake consisting of various external data sources. The Retrieve superstep is the first contact between your data science and the source systems.
Andreas François Vermeulen
Chapter 8. Assess Superstep
Abstract
The objectives of this chapter are to show you how to assess your data science data for invalid or erroneous data values.
Andreas François Vermeulen
Chapter 9. Process Superstep
Abstract
The Process superstep adapts the assess results of the retrieve versions of the data sources into a highly structured data vault that will form the basic data structure for the rest of the data science steps. This data vault involves the formulation of a standard data amalgamation format across a range of projects.
Andreas François Vermeulen
Chapter 10. Transform Superstep
Abstract
The Transform superstep allows you, as a data scientist, to take data from the data vault and formulate answers to questions raised by your investigations. The transformation step is the data science process that converts results into insights.
Andreas François Vermeulen
Chapter 11. Organize and Report Supersteps
Abstract
This chapter will cover the Organize superstep first, then proceed to the Report superstep. The two sections will enable you, as the data scientist, first to collect the relevant information from your prepared data warehouse, to match the requirement of a specific segment of the customer’s decision makers.
Andreas François Vermeulen
Backmatter
Metadaten
Titel
Practical Data Science
verfasst von
Andreas François Vermeulen
Copyright-Jahr
2018
Verlag
Apress
Electronic ISBN
978-1-4842-3054-1
Print ISBN
978-1-4842-3053-4
DOI
https://doi.org/10.1007/978-1-4842-3054-1

Premium Partner