Skip to main content

About this book

This textbook, fully updated to feature Python version 3.7, covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules. The entire text, including all the figures and numerical results, is reproducible using the Python codes and their associated Jupyter/IPython notebooks, which are provided as supplementary downloads. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. The update features full coverage of Web-based scientific visualization with Bokeh Jupyter Hub; Fisher Exact, Cohen’s D and Rank-Sum Tests; Local Regression, Spline, and Additive Methods; and Survival Analysis, Stochastic Gradient Trees, and Neural Networks and Deep Learning. Modern Python modules like Pandas, Sympy, and Scikit-learn are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples. This book is suitable for classes in probability, statistics, or machine learning and requires only rudimentary knowledge of Python programming.

Table of Contents


Chapter 1. Getting Started with Scientific Python

Python is fundamental to data science and machine learning, as well as an ever-expanding list of areas including cyber-security, and web programming. The fundamental reason for Python’s widespread use is that it provides the software glue that permits easy exchange of methods and data across core routines typically written in Fortran or C.
José Unpingco

2. Probability

This chapter takes a geometric view of probability theory and relates it to familiar concepts in linear algebra and geometry. This approach connects your natural geometric intuition to the key abstractions in probability that can help guide your reasoning. This is particularly important in probability because it is easy to be misled. We need a bit of rigor and some intuition to guide us.
José Unpingco

Chapter 3. Statistics

To get started thinking about statistics, consider the three famous problems
  • Suppose you have a bag filled with colored marbles. You close your eyes and reach into it and pull out a handful of marbles, what can you say about what is in the bag?
  • You arrive in a strange town and you need a taxicab. You look out the window, and in the dark, you can just barely make out the number on the roof of one of the cabs. In this town, you know they label the cabs sequentially. How many cabs does the town have?
  • You have already taken the entrance exam twice and you want to know if it’s worth it to take it a third time in the hopes that your score will improve. Because only the last score is reported, you are worried that you may do worse the third time. How do you decide whether or not to take the test again?
José Unpingco

Chapter 4. Machine Learning

Machine Learning is a huge and growing area. In this chapter, we cannot possibly even survey this area, but we can provide some context and some connections to probability and statistics that should make it easier to think about machine learning and how to apply these methods to real-world problems. The fundamental problem of statistics is basically the same as machine learning: given some data, how to make it actionable? For statistics, the answer is to construct analytic estimators using powerful theory. For machine learning, the answer is algorithmic prediction. Given a dataset, what forward-looking inferences can we draw? There is a subtle bit in this description: how can we know the future if all we have is data about the past? This is the crux of the matter for machine learning, as we will explore in the chapter.
José Unpingco

5. Correction to: Probability

José Unpingco


Additional information