Data Science Fundamentals for Python and MongoDB

Author: David Paper

Publisher: Apress

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

Build the foundational data science skills necessary to work with and better understand complex data science algorithms. This example-driven book provides complete Python coding examples to complement and clarify data science concepts, and enrich the learning experience. Coding examples include visualizations whenever appropriate. The book is a necessary precursor to applying and implementing machine learning algorithms.
The book is self-contained. All of the math, statistics, stochastic, and programming skills required to master the content are covered. In-depth knowledge of object-oriented programming isn’t required because complete examples are provided and explained.
Data Science Fundamentals with Python and MongoDB is an excellent starting point for those interested in pursuing a career in data science. Like any science, the fundamentals of data science are a prerequisite to competency. Without proficiency in mathematics, statistics, data manipulation, and coding, the path to success is “rocky” at best. The coding examples in this book are concise, accurate, and complete, and perfectly complement the data science concepts introduced.
What You'll LearnPrepare for a career in data science
Work with complex data structures in Python
Simulate with Monte Carlo and Stochastic algorithms
Apply linear algebra using vectors and matrices
Utilize complex algorithms such as gradient descent and principal component analysis
Wrangle, cleanse, visualize, and problem solve with data
Use MongoDB and JSON to work with data
Who This Book Is For

The novice yearning to break into the data science world, and the enthusiast looking to enrich, deepen, and develop data science skills through mastering the underlying fundamentals that are sometimes skipped over in the rush to be productive. Some knowledge of object-oriented programming will make learning easier.

Frontmatter

Chapter 1. Introduction

Abstract

Data science is an interdisciplinary field encompassing scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured. It draws principles from mathematics, statistics, information science, computer science, machine learning, visualization, data mining, and predictive analytics. However, it is fundamentally grounded in mathematics.

David Paper

Chapter 2. Monte Carlo Simulation and Density Functions

Abstract

Monte Carlo simulation (MCS) applies repeated random sampling (randomness) to obtain numerical results for deterministic problem solving. It is widely used in optimization, numerical integration, and risk-based decision making. Probability and cumulative density functions are statistical measures that apply probability distributions for random variables, and can be used in conjunction with MCS to solve deterministic problem.

David Paper

Chapter 3. Linear Algebra

Abstract

Linear algebra is a branch of mathematics concerning vector spaces and linear mappings between such spaces. Simply, it explores linelike relationships. Practically every area of modern science approximates modeling equations with linear algebra. In particular, data science relies on linear algebra for machine learning, mathematical modeling, and dimensional distribution problem solving.

David Paper

Chapter 4. Gradient Descent

Abstract

Gradient descent (GD) is an algorithm that minimizes (or maximizes) functions. To apply, start at an initial set of a function’s parameter values and iteratively move toward a set of parameter values that minimize the function. Iterative minimization is achieved using calculus by taking steps in the negative direction of the function’s gradient. GD is important because optimization is a big part of machine learning. Also, GD is easy to implement, generic, and efficient (fast).

David Paper

Chapter 5. Working with Data

Abstract

Working with data details the earliest processes of data science problem solving. The 1st step is to identify the problem, which determines all else that needs to be done. The 2nd step is to gather data. The 3rd step is to wrangle (munge) data, which is critical. Wrangling is getting data into a form that is useful for machine learning and other data science problems. Of course, wrangled data will probably have to be cleaned. The 4th step is to visualize the data. Visualization helps you get to know the data and, hopefully, identify patterns.

David Paper

Chapter 6. Exploring Data

Abstract

Exploring probes deeper into the realm of data. An important topic in data science is dimensionality reduction. This chapter borrows munged data from Chapter 5 to demonstrate how this works. Another topic is speed simulation. When working with large datasets, speed is of great importance. Big data is explored with a popular dataset used by academics and industry. Finally, Twitter and Web scraping are two important data sources for exploration.

David Paper

Backmatter

Title: Data Science Fundamentals for Python and MongoDB
Author: David Paper
Publisher: Apress
Electronic ISBN: 978-1-4842-3597-3
Print ISBN: 978-1-4842-3596-6
DOI: https://doi.org/10.1007/978-1-4842-3597-3

Springer Professional

Data Science Fundamentals for Python and MongoDB

About this book

Table of Contents

Frontmatter

Chapter 1. Introduction

Chapter 2. Monte Carlo Simulation and Density Functions

Chapter 3. Linear Algebra

Chapter 4. Gradient Descent

Chapter 5. Working with Data

Chapter 6. Exploring Data

Backmatter

Premium Partner