Skip to main content
main-content
Top

About this book

This textbook provides an introduction to the free software Python and its use for statistical data analysis. It covers common statistical tests for continuous, discrete and categorical data, as well as linear regression analysis and topics from survival analysis and Bayesian statistics. Working code and data for Python solutions for each test, together with easy-to-follow Python examples, can be reproduced by the reader and reinforce their immediate understanding of the topic. With recent advances in the Python ecosystem, Python has become a popular language for scientific computing, offering a powerful environment for statistical data analysis and an interesting alternative to R. The book is intended for master and PhD students, mainly from the life and medical sciences, with a basic knowledge of statistics. As it also provides some statistics background, the book can be used by anyone who wants to perform a statistical data analysis.

Table of Contents

Frontmatter

Python and Statistics

Frontmatter

Chapter 1. Why Statistics?

Abstract
A short introduction to the field, including a list of recommendable statistics books and references on the WWW.
Thomas Haslwanter

Chapter 2. Python

Abstract
This chapter is a kick-start into Python. It shows how to install Python under Windows, Linux, or MacOS, and walks step-by-step through documented programming examples. The most important statistics packages for Python are introduced. Tips are given to help avoid some of the problems frequently encountered while learning Python.
Thomas Haslwanter

Chapter 3. Data Input

Abstract
It may be surprising, but reading data into the system in the correct format and checking for erroneous or missing entries is often one of the most time consuming parts of the data analysis. This chapter shows how to read data into Python, from text files, Excel files, and from data preprocessed by Matlab. Thus it forms the link between the chapter on Python, and the first chapter on statistical data analysis.
Thomas Haslwanter

Chapter 4. Display of Statistical Data

Abstract
This chapter shows a number of different ways to visualize statistical data sets. For this it first introduces different ways to generate plots in Python. Special attention is paid to interactive plots and the positioning of plots on the computer screen. Then examples are given for scatter plots, histograms, KDE-plots, and a number of other two- and three-dimensional representations of statistical data.
Thomas Haslwanter

Distributions and Hypothesis Tests

Frontmatter

Chapter 5. Background

Abstract
This chapter serves to define the statistical basics, like the concepts of populations and samples, and of probability distributions. It also includes a short overview of study design, a topic often seriously underestimated by many beginning researchers.
Thomas Haslwanter

Chapter 6. Distributions of One Variable

Abstract
This chapter shows how to characterize the position and the variability of a distribution, and then uses the normal distribution to describe the most important Python methods common to all distribution functions. Then the most important discrete and continuous distributions are presented, such as the t-distribution, chi-square-distribution, and the F-distribution.
Thomas Haslwanter

Chapter 7. Hypothesis Tests

Abstract
This chapter describes a typical workflow in the analysis of statistical data. Special attention is paid to visual and quantitative tests of normality for the data. Then the concept of hypothesis tests is explained, as well as the different types of errors, and the interpretation of p-values is discussed. Finally, the common test concepts of sensitivity and specificity are introduced and explained.
Thomas Haslwanter

Chapter 8. Tests of Means of Numerical Data

Abstract
This chapter covers hypothesis tests for the mean values of groups, and shows how to implement each of these tests in Python:
  • Comparison of one group with a fixed value.
  • Comparison of two groups with respect to each other.
  • Comparison of three or more groups with each other.
For each case, Python implementations of parametric tests are presented (which can be used for normally distributed data), as well as implementations of nonparametric tests.
Thomas Haslwanter

Chapter 9. Tests on Categorical Data

Abstract
Categorical data are data that can take on one of a limited, and usually fixed, number of possible values. (A “mean value” typically makes no sense for categorical data.) This chapter covers the tests most commonly used for the analysis of categorical data: chi-square tests, Fisher’s Exact Test, McNemar’s Test, and Cochran’s Q-Test.
Thomas Haslwanter

Chapter 10. Analysis of Survival Times

Abstract
This chapter is dedicated to “survival analysis,” which also encompasses the statistical characterization of material failures and machine breakdowns. The statistical treatment of survival analysis requires a somewhat different approach than the other hypothesis tests, as at the end of a study many individuals may still be “alive,” and the corresponding measurement value is therefore only partially known.
Thomas Haslwanter

Statistical Modeling

Frontmatter

Chapter 11. Linear Regression Models

Abstract
After an introduction to Pearson’s, Spearman’s, and Kendall’s correlation coefficients, this chapter describes how to implement and solve linear regression models in Python. The resulting model parameters are discussed, as well as the assumptions of the models and interpretations of the model results. Since bootstrapping can be helpful in the evaluation of some models, the final section in this chapter shows a Python implementation of a bootstrapping example.
Thomas Haslwanter

Chapter 12. Multivariate Data Analysis

Abstract
This short chapter shows how the statistical properties of higher-dimensional data can be visualized: with 3D-surfaces, Scatterplot Matrices, and Correlation Matrices.
Thomas Haslwanter

Chapter 13. Tests on Discrete Data

Abstract
Generalized Linear Models (GLMs) substantially extend the power of statistical modeling. This chapter introduces GLMs and shows how to implement logistic regression, a frequently used application of GLMs, with the tools provided by Python. A worked example of Ordinal Logistic Regression demonstrates how the package “scikit-learn” and the tools of machine learning can be used for statistical modeling.
Thomas Haslwanter

Chapter 14. Bayesian Statistics

Abstract
Bayesian Statistics, a technique that has become very popular for many types of machine learning, starts out with a new view at statistical data: it takes the observed data as fixed, and looks at the likelihood to find certain model parameters. This chapter introduces Bayesian Statistics, and provides a worked example using the Python package “PyMC,” showing how Bayesian Statistics can provide more information than classical statistical modeling.
Thomas Haslwanter

Backmatter

Additional information

Premium Partner

    Image Credits