nach oben

2018 | Buch

Kapitel lesen Erstes Kapitel lesen

Practical Data Science

A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets

verfasst von: Andreas François Vermeulen

Verlag: Apress

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets.
The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions.

What You'll LearnBecome fluent in the essential concepts and terminology of data science and data engineering
Build and use a technology stack that meets industry criteria
Master the methods for retrieving actionable business knowledge
Coordinate the handling of polyglot data types in a data lake for repeatable resultsWho This Book Is For

Data scientists and data engineers who are required to convert data from a data lake into actionable knowledge for their business, and students who aspire to be data scientists and data engineers

Inhaltsverzeichnis

Frontmatter

Chapter 1. Data Science Technology Stack

Abstract

The Data Science Technology Stack covers the data processing requirements in the Rapid Information Factory ecosystem. Throughout the book, I will discuss the stack as the guiding pattern.

Andreas François Vermeulen

Chapter 2. Vermeulen-Krennwallner-Hillman-Clark

Abstract

Let’s begin by constructing a customer. I have created a fictional company for which you will perform the practical data science as your progress through this book. You can execute your examples in either a Windows or Linux environment. You only have to download the desired example set.

Andreas François Vermeulen

Chapter 3. Layered Framework

Abstract

In this chapter, I will introduce you to new concepts that enable us to share insights on a common understanding and terminology. I will define the Data Science Framework in detail, while introducing the Homogeneous Ontology for Recursive Uniform Schema (HORUS). I will take you on a high-level tour of the top layers of the framework, by explaining the fundamentals of the business layer, utility layer, operational management layer, plus audit, balance, and control layers.

Andreas François Vermeulen

Chapter 4. Business Layer

Abstract

In this chapter, I define the business layer in detail, clarifying why, where, and what functional and nonfunctional requirements are presented in the data science solution. With the aid of examples, I will help you to engineer a practical business layer and advise you, as I explain the layer in detail and discuss methods to assist you in performing good data science.

Andreas François Vermeulen

Chapter 5. Utility Layer

Abstract

The utility layer is used to store repeatable practical methods of data science. The objective of this chapter is to define how the utility layer is used in the ecosystem. Utilities are the common and verified workhorses of the data science ecosystem.

Andreas François Vermeulen

Chapter 6. Three Management Layers

Abstract

This chapter is about the three management layers that are must-haves for any large-scale data science system. I will discuss them at a basic level. I suggest you scale-out these management capabilities, as your environment grows.

Andreas François Vermeulen

Chapter 7. Retrieve Superstep

Abstract

In this chapter, I define the Retrieve superstep as a practical method for importing completely into the processing ecosystem a data lake consisting of various external data sources. The Retrieve superstep is the first contact between your data science and the source systems.

Andreas François Vermeulen

Chapter 8. Assess Superstep

Abstract

The objectives of this chapter are to show you how to assess your data science data for invalid or erroneous data values.

Andreas François Vermeulen

Chapter 9. Process Superstep

Abstract

The Process superstep adapts the assess results of the retrieve versions of the data sources into a highly structured data vault that will form the basic data structure for the rest of the data science steps. This data vault involves the formulation of a standard data amalgamation format across a range of projects.

Andreas François Vermeulen

Chapter 10. Transform Superstep

Abstract

The Transform superstep allows you, as a data scientist, to take data from the data vault and formulate answers to questions raised by your investigations. The transformation step is the data science process that converts results into insights.

Andreas François Vermeulen

Chapter 11. Organize and Report Supersteps

Abstract

This chapter will cover the Organize superstep first, then proceed to the Report superstep. The two sections will enable you, as the data scientist, first to collect the relevant information from your prepared data warehouse, to match the requirement of a specific segment of the customer’s decision makers.

Andreas François Vermeulen

Backmatter

Titel: Practical Data Science
verfasst von: Andreas François Vermeulen
Verlag: Apress
Electronic ISBN: 978-1-4842-3054-1
Print ISBN: 978-1-4842-3053-4
DOI: https://doi.org/10.1007/978-1-4842-3054-1

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Chapter 1. Data Science Technology Stack

Chapter 2. Vermeulen-Krennwallner-Hillman-Clark

Chapter 3. Layered Framework

Chapter 4. Business Layer

Chapter 5. Utility Layer

Chapter 6. Three Management Layers

Chapter 7. Retrieve Superstep

Chapter 8. Assess Superstep

Chapter 9. Process Superstep

Chapter 10. Transform Superstep

Chapter 11. Organize and Report Supersteps

Backmatter

Premium Partner