nach oben

2017 | Buch

Kapitel lesen Erstes Kapitel lesen

Docker for Data Science

Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server

verfasst von: Joshua Cook

Verlag: Apress

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Learn Docker "infrastructure as code" technology to define a system for performing standard but non-trivial data tasks on medium- to large-scale data sets, using Jupyter as the master controller.
It is not uncommon for a real-world data set to fail to be easily managed. The set may not fit well into access memory or may require prohibitively long processing. These are significant challenges to skilled software engineers and they can render the standard Jupyter system unusable.

As a solution to this problem, Docker for Data Science proposes using Docker. You will learn how to use existing pre-compiled public images created by the major open-source technologies—Python, Jupyter, Postgres—as well as using the Dockerfile to extend these images to suit your specific purposes. The Docker-Compose technology is examined and you will learn how it can be used to build a linked system with Python churning data behind the scenes and Jupyter managing these background tasks. Best practices in using existing images are explored as well as developing your own images to deploy state-of-the-art machine learning and optimization algorithms.
What You'll Learn Master interactive development using the Jupyter platform
Run and build Docker containers from scratch and from publicly available open-source images
Write infrastructure as code using the docker-compose tool and its docker-compose.yml file type
Deploy a multi-service data science application across a cloud-based system

Who This Book Is For
Data scientists, machine learning engineers, artificial intelligence researchers, Kagglers, and software developers

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract

The typical data scientist consistently has a series of extremely complicated problems on their mind beyond considerations stemming from their system infrastructure. Still, it is inevitable that infrastructure issues will present themselves. To oversimplify, we might draw a distinction between the “modeling problem” and the “engineering problem.” The data scientist is uniquely qualified to solve the former, but can often come up short in solving the latter.

Joshua Cook

Chapter 2. Docker

Abstract

Docker is a way to isolate a process from the system on which it is running. It allows us to isolate the code written to define an application and the resources required to run that application from the hardware on which it runs. We add a layer of complexity to our software, but in doing so gain the advantage of ensuring that our local development environment will be identical to any possible environment into which we would deploy the application. If a system can run Docker, a system can run our process. With the addition of a thin layer of abstraction we become hardware independent. On its face, this would seem to be an impossible task. As of 2014, there were 285 actively maintained Linux distributions and multiple major versions of both OS X and Windows. How could we possibly write a system to allow for all possible development, testing, and production environments?

Joshua Cook

Chapter 3. Interactive Programming

Abstract

Interactive computing is a dialog between people and machines.

Joshua Cook

Chapter 4. The Docker Engine

Abstract

If I have not emphasized this enough, the magic happens because we can count on the Docker engine to work the same way no matter our underlying hardware (or virtual hardware) and operating system. We build it using the Docker engine, we test it using the Docker engine, and we deploy it using the Docker engine.

Joshua Cook

Chapter 5. The Dockerfile

Abstract

Every Docker image is defined as a stack of layers, each defining fundamental, stateless changes to the image. The first layer might be the virtual machine's operating system (a Debian or Ubuntu Docker image), the next the installation of dependencies necessary for your application to run, and all the way up to the source code of your application.

Joshua Cook

Chapter 6. Docker Hub

Abstract

Equipped with tools for developing our own images, it quickly becomes important to be able to save and share the images we have written beyond our system. Docker Registries allow us to do just this. For your purposes, the public Docker Registry, Docker Hub, will be more than sufficient, though it is worth noting that other registries exist and that it is possible to create and host your own registry.

Joshua Cook

Chapter 7. The Opinionated Jupyter Stacks

Abstract

The Jupyter Notebook is based on a set of open standards for interactive computing.

Joshua Cook

Chapter 8. The Data Stores

Abstract

I propose that using Docker, it is possible to streamline the process to an extent that using a data store for even the smallest of datasets becomes a practical matter. I’ll show you a series of best practices for designing and deploying data stores, a set of practices that will be sufficient for working with all but the largest of data sets. Conforming to Docker best practice, you will work with Docker Hub official images throughout this chapter.

Joshua Cook

Chapter 9. Docker Compose

Abstract

Thus far, I have focused the discussion on single containers or individually managed pairs of containers running on the same system. In this chapter, you’ll extend your ability to develop applications comprised of multiple containers using the Docker Compose tool.

Joshua Cook

Chapter 10. Interactive Software Development

Abstract

The most famous of these might be the Rails framework for the Ruby language. Rails is written from the ground up around its adopted paradigm, the Model-View-Controller design pattern, a pattern heavily favored in the implementation of user-facing software.

Joshua Cook

Backmatter

Titel: Docker for Data Science
verfasst von: Joshua Cook
Verlag: Apress
Electronic ISBN: 978-1-4842-3012-1
Print ISBN: 978-1-4842-3011-4
DOI: https://doi.org/10.1007/978-1-4842-3012-1

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Chapter 2. Docker

Chapter 3. Interactive Programming

Chapter 4. The Docker Engine

Chapter 5. The Dockerfile

Chapter 6. Docker Hub

Chapter 7. The Opinionated Jupyter Stacks

Chapter 8. The Data Stores

Chapter 9. Docker Compose

Chapter 10. Interactive Software Development

Backmatter

Premium Partner