Skip to main content
main-content

Über dieses Buch

Leverage the numerical and mathematical modules in Python and its standard library as well as popular open source numerical Python packages like NumPy, SciPy, FiPy, matplotlib and more. This fully revised edition, updated with the latest details of each package and changes to Jupyter projects, demonstrates how to numerically compute solutions and mathematically model applications in big data, cloud computing, financial engineering, business management and more.

Numerical Python, Second Edition, presents many brand-new case study examples of applications in data science and statistics using Python, along with extensions to many previous examples. Each of these demonstrates the power of Python for rapid development and exploratory computing due to its simple and high-level syntax and multiple options for data analysis.

After reading this book, readers will be familiar with many computing techniques including array-based and symbolic computing, visualization and numerical file I/O, equation solving, optimization, interpolation and integration, and domain-specific computational problems, such as differential equation solving, data analysis, statistical modeling and machine learning.

What You'll Learn

Work with vectors and matrices using NumPy

Plot and visualize data with Matplotlib

Perform data analysis tasks with Pandas and SciPy

Review statistical modeling and machine learning with statsmodels and scikit-learn

Optimize Python code using Numba and Cython

Who This Book Is For

Developers who want to understand how to use Python and its related ecosystem for numerical computing.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction to Computing with Python

Abstract
This book is about using Python for numerical computing. Python is a high-level, general-purpose interpreted programming language that is widely used in scientific computing and engineering. As a general-purpose language, Python was not specifically designed for numerical computing, but many of its characteristics make it well suited for this task. First and foremost, Python is well known for its clean and easy-to-read code syntax. Good code readability improves maintainability, which in general results in fewer bugs and better applications overall, but it also enables rapid code development. This readability and expressiveness are essential in exploratory and interactive computing, which requires fast turnaround for testing various ideas and models.
Robert Johansson

Chapter 2. Vectors, Matrices, and Multidimensional Arrays

Abstract
Vectors, matrices, and arrays of higher dimensions are essential tools in numerical computing. When a computation must be repeated for a set of input values, it is natural and advantageous to represent the data as arrays and the computation in terms of array operations. Computations that are formulated this way are said to be vectorized. Vectorized computing eliminates the need for many explicit loops over the array elements by applying batch operations on the array data. The result is concise and more maintainable code, and it enables delegating the implementation of (e.g., elementwise) array operations to more efficient low-level libraries. Vectorized computations can therefore be significantly faster than sequential element-by-element computations. This is particularly important in an interpreted language such as Python, where looping over arrays element by element entails a significant performance overhead.
Robert Johansson

Chapter 3. Symbolic Computing

Abstract
Symbolic computing is an entirely different paradigm in computing compared to the numerical array-based computing introduced in the previous chapter. In symbolic computing software, also known as computer algebra systems (CASs), representations of mathematical objects and expressions are manipulated and transformed analytically. Symbolic computing is mainly about using computers to automate analytical computations that can in principle be done by hand with pen and paper. However, by automating the book-keeping and the manipulations of mathematical expressions using a computer algebra system, it is possible to take analytical computing much further than can realistically be done by hand. Symbolic computing is a great tool for checking and debugging analytical calculations that are done by hand, but more importantly it enables carrying out analytical analysis that may not otherwise be possible.
Robert Johansson

Chapter 4. Plotting and Visualization

Abstract
Visualization is a universal tool for investigating and communicating results of computational studies, and it is hardly an exaggeration to say that the end product of nearly all computations – be it numeric or symbolic – is a plot or a graph of some sort. It is when visualized in graphical form that knowledge and insights can be most easily gained from computational results. Visualization is therefore a tremendously important part of the workflow in all fields of computational studies.
Robert Johansson

Chapter 5. Equation Solving

Abstract
In the previous chapters, we have discussed general methodologies and techniques, namely, array-based numerical computing, symbolic computing, and visualization. These methods are the cornerstones of scientific computing that make up a fundamental toolset we have at our disposal when attacking computational problems.
Robert Johansson

Chapter 6. Optimization

Abstract
In this chapter, we will build on Chapter 5 about equation solving and explore the related topic of solving optimization problems. In general, optimization is the process of finding and selecting the optimal element from a set of feasible candidates. In mathematical optimization, this problem is usually formulated as determining the extreme value of a function on a given domain. An extreme value, or an optimal value, can refer to either the minimum or maximum of the function, depending on the application and the specific problem. In this chapter we are concerned with the optimization of real-valued functions of one or several variables, which optionally can be subject to a set of constraints that restricts the domain of the function.
Robert Johansson

Chapter 7. Interpolation

Abstract
Interpolation is a mathematical method for constructing a function from a discrete set of data points. The interpolation function, or interpolant, should exactly coincide with the given data points, and it can also be evaluated for other intermediate input values within the sampled range. There are many applications of interpolation: A typical use-case that provides an intuitive picture is the plotting of a smooth curve through a given set of data points. Another use-case is to approximate complicated functions, which, for example, could be computationally demanding to evaluate. In that case, it can be beneficial to evaluate the original function only at a limited number of points and use interpolation to approximate the function when evaluating it for intermediary points.
Robert Johansson

Chapter 8. Integration

Abstract
In this chapter we cover different aspects of integration, with the main focus on numerical integration. For historical reasons, numerical integration is also known as quadrature. Integration is significantly more difficult than its inverse operation – differentiation – and while there are many examples of integrals that can be calculated analytically, in general we have to resort to numerical methods. Depending on the properties of the integrand (the function being integrated) and the integration limits, it can be easy or difficult to numerically compute an integral. Integrals of continuous functions and with finite integration limits can in most cases be computed efficiently in one dimension, but integrable functions with singularities or integrals with infinite integration limits are examples of cases that can be difficult to handle numerically, even in a single dimension. Two-dimensional integrals (double integrals) and higher-order integrals can be numerically computed with repeated single-dimension integration or using methods that are multidimensional generalizations of the techniques used to solve single-dimensional integrals. However, the computational complexity grows quickly with the number of dimensions to integrate over, and in practice such methods are only feasible for low-dimensional integrals, such as double integrals or triple integrals. Integrals of higher dimension than that often require completely different techniques, such as Monte Carlo sampling algorithms.
Robert Johansson

Chapter 9. Ordinary Differential Equations

Abstract
Equations wherein the unknown quantity is a function, rather than a variable, and that involve derivatives of the unknown function, are known as differential equations. An ordinary differential equation is a special case where the unknown function has only one independent variable with respect to which derivatives occur in the equation. If, on the other hand, derivatives of more than one variable occur in the equation, then it is known as a partial differential equation, and that is the topic of Chapter 11. Here we focus on ordinary differential equations (in the following abbreviated as ODEs), and we explore both symbolic and numerical methods for solving this type of equations in this chapter. Analytical closed-form solutions to ODEs often do not exist, but for many special types of ODEs, there are analytical solutions, and in those cases, there is a chance that we can find solutions using symbolic methods. If that fails, we must as usual resort to numerical techniques.
Robert Johansson

Chapter 10. Sparse Matrices and Graphs

Abstract
We have already seen numerous examples of arrays and matrices being the essential entities in many aspects of numerical computing. So far we have represented arrays with the NumPy ndarray data structure, which is a heterogeneous representation that stores all the elements of the array that it represents. In many cases, this is the most efficient way to represent an object such as a vector, matrix, or a higher-dimensional array. However, notable exceptions are matrices where most of the elements are zeros. Such matrices are known as sparse matrices, and they occur in many applications, for example, in connection networks (such as circuits) and in large algebraic equation systems that arise, for example, when solving partial differential equations (see Chapter 11 for examples).
Robert Johansson

Chapter 11. Partial Differential Equations

Abstract
Partial differential equations (PDEs) are multivariate differential equations where derivatives of more than one dependent variable occur. That is, the derivatives in the equation are partial derivatives. As such they are generalizations of ordinary differential equations, which were covered in Chapter 9. Conceptually, the difference between ordinary and partial differential equations is not that big, but the computational techniques required to deal with ODEs and PDEs are very different, and solving PDEs is typically much more computationally demanding. Most techniques for solving PDEs numerically are based on the idea of discretizing the problem in each independent variable that occurs in the PDE, thereby recasting the problem into an algebraic form. This usually results in very large-scale linear algebra problems. Two common techniques for recasting PDEs into algebraic form are the finite-difference methods (FDMs), where the derivatives in the problem are approximated with their finite-difference formula, and the finite-element methods (FEMs), where the unknown function is written as linear combination of simple basis functions that can be differentiated and integrated easily. The unknown function is described by a set of coefficients for the basis functions in this representation, and by a suitable rewriting of the PDEs, we can obtain algebraic equations for these coefficients.
Robert Johansson

Chapter 12. Data Processing and Analysis

Abstract
In the last several chapters, we have covered the main topics of traditional scientific computing. These topics provide a foundation for most computational work. Starting with this chapter, we move on to explore data processing and analysis, statistics, and statistical modeling. As the first step in this direction, we look at the data analysis library pandas. This library provides convenient data structures for representing series and tables of data and makes it easy to transform, split, merge, and convert data. These are important steps in the process of cleansing raw data into a tidy form that is suitable for analysis. The pandas library builds on top of NumPy and complements it with features that are particularly useful when handling data, such as labeled indexing, hierarchical indices, alignment of data for comparison and merging of datasets, handling of missing data, and much more. As such, the pandas library has become a de facto standard library for high-level data processing in Python, especially for statistics applications. The pandas library itself contains only limited support for statistical modeling (namely, linear regression). For more involved statistical analysis and modeling, there are other packages available, such as statmodels, patsy, and scikit-learn, which we cover in later chapters. However, also for statistical modeling with these packages, pandas can still be used for data representation and preparation. The pandas library is therefore a key component in the software stack for data analysis with Python.
Robert Johansson

Chapter 13. Statistics

Abstract
Statistics has long been a field of mathematics that is relevant to practically all applied disciplines of science and engineering, as well as business, medicine, and other fields where data is used for obtaining knowledge and making decisions. With the recent proliferation of data analytics, there has been a surge of renewed interest in statistical methods. Still, computer-aided statistics has a long history, and it is a field that traditionally has been dominated by domain-specific software packages and programming environments, such as the S language, and more recently its open source counterpart: the R language. The use of Python for statistical analysis has grown rapidly over the last several years, and by now there is a mature collection of statistical libraries for Python. With these libraries Python can match the performance and features of domain-specific languages in many areas of statistics, albeit not all, while also providing the unique advantages of the Python programming language and its environment. The pandas library that we discussed in Chapter 12 is an example of a development within the Python community that was strongly influenced by statistical software, with the introduction of the data frame data structure to the Python environment. The NumPy and SciPy libraries provide computational tools for many fundamental statistical concepts, and higher-level statistical modeling and machine learning are covered by the statsmodels and scikit-learn libraries, which we will see more of in the following chapters.
Robert Johansson

Chapter 14. Statistical Modeling

Abstract
In the previous chapter, we covered basic statistical concepts and methods. In this chapter we build on the foundation laid out in the previous chapter and explore statistical modeling, which deals with creating models that attempt to explain data. A model can have one or several parameters, and we can use a fitting procedure to find the values of the parameter that best explains the observed data. Once a model has been fitted to data, it can be used to predict the values of new observations, given the values of the independent variables of the model. We can also perform statistical analysis on the data and the fitted model and try to answer questions such as if the model accurately explains the data, which factors in the model are more relevant (predictive) than others, and if there are parameters that do not contribute significantly to the predictive power of the model.
Robert Johansson

Chapter 15. Machine Learning

Abstract
In this chapter we explore machine learning. This topic is closely related to statistical modeling, which we considered in Chapter 14, in the sense that both deal with using data to describe and predict outcomes of uncertain or unknown processes. However, while statistical modeling emphasizes the model used in the analysis, machine learning sidesteps the model part and focuses on algorithms that can be trained to predict the outcome of new observations. In other words, the approach taken in statistical modeling emphasizes understanding how the data is generated, by devising models and tuning their parameters by fitting to the data. If the model is found to fit the data well and if it satisfies the relevant model assumptions, then the model gives an overall description of the process, and it can be used to compute statistics with known distributions and for evaluating statistical tests. However, if the actual data is too complex to be explained using available statistical models, this approach has reached its limits. In machine learning, on the other hand, the actual process that generates the data, and potential models thereof, is not central. Instead, the observed data and the explanatory variables are the fundamental starting point of a machine-learning application. Given data, machine-learning methods can be used to find patterns and structure in the data, which can be used to predict the outcome of new observations. Machine learning therefore does not provide an understanding of how data was generated, and because fewer assumptions are made regarding the distribution and statistical properties of the data, we typically cannot compute statistics and perform statistical tests regarding the significance of certain observations. Instead, machine learning puts a strong emphasis on the accuracy with which new observations are predicted.
Robert Johansson

Chapter 16. Bayesian Statistics

Abstract
In this chapter, we explore an alternative interpretation of statistics – Bayesian statistics – and the methods associated with this interpretation. Bayesian statistics, in contrast to the frequentist’s statistics that we used in Chapter 13 and Chapter 14, treats probability as a degree of belief rather than as a measure of proportions of observed outcomes. This different point of view gives rise to distinct statistical methods that we can use in problem-solving. While it is generally true that statistical problems can in principle be solved using either frequentist or Bayesian statistics, there are practical differences that make these two approaches to statistics suitable for different types of problems.
Robert Johansson

Chapter 17. Signal Processing

Abstract
In this chapter we explore signal processing, which is a subject with applications in diverse branches of science and engineering. A signal in this context can be a quantity that varies in time (temporal signal) or as a function of space coordinates (spatial signal). For example, an audio signal is a typical example of a temporal signal, while an image is a typical example of a spatial signal in two dimensions. In reality, signals are often continuous functions, but in computational applications, it is common to work with discretized signals, where the original continuous signal is sampled at discrete points with uniform distances. The sampling theorem gives rigorous and quantitative conditions for when a continuous signal can be accurately represented by a discrete sequence of samples.
Robert Johansson

Chapter 18. Data Input and Output

Abstract
In nearly all scientific computing and data analysis applications, there is a need for data input and output. This includes to load datasets and to persistently store results to files on disk or to databases. Getting data in and out of programs is consequently a key step in the computational workflow. There are many standardized formats for storing structured and unstructured data. The benefits of using standardized formats are obvious: You can use existing libraries for reading and writing data, saving yourself both time and effort. In the course of working with scientific and technical computing, it is likely that you will face a variety of data formats through interaction with colleagues and peers or when acquiring data from sources such as equipment and databases. As a computational practitioner, it is important to be able to handle data efficiently and seamlessly, regardless of which format it comes in. This motivates why this entire chapter is devoted to the topic of data input and output.
Robert Johansson

Chapter 19. Code Optimization

Abstract
In this book we have explored various topics of scientific and technical computing using Python and its ecosystem of libraries. As touched upon in the very first chapter of this book, the Python environment for scientific computing generally strikes a good balance between a high-level environment that is suitable for exploratory computing and rapid prototyping – which minimizes development efforts – and high-performance numerics, which minimize application runtimes. High-performance numerics is achieved not through the Python language itself, but rather through leveraging libraries that contain or use externally compiled code, typically written in C or in Fortran. Because of this, in computing applications that rely heavily on libraries such as NumPy and SciPy, most of the number crunching is performed by compiled code, and the performance is therefore vastly better than if the same computation were to be implemented purely in Python.
Robert Johansson

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise