Skip to main content
Top

2022 | Book

OCaml Scientific Computing

Functional Programming in Data Science and Artificial Intelligence

share
SHARE
insite
SEARCH

About this book

This book is about the harmonious synthesis of functional programming and numerical computation. It shows how the expressiveness of OCaml allows for fast and safe development of data science applications. Step by step, the authors build up to use cases drawn from many areas of Data Science, Machine Learning, and AI, and then delve into how to deploy at scale, using parallel, distributed, and accelerated frameworks to gain all the advantages of cloud computing environments.

To this end, the book is divided into three parts, each focusing on a different area. Part I begins by introducing how basic numerical techniques are performed in OCaml, including classical mathematical topics (interpolation and quadrature), statistics, and linear algebra. It moves on from using only scalar values to multi-dimensional arrays, introducing the tensor and Ndarray, core data types in any numerical computing system. It concludes with two more classical numerical computing topics, the solution of Ordinary Differential Equations (ODEs) and Signal Processing, as well as introducing the visualization module we use throughout this book. Part II is dedicated to advanced optimization techniques that are core to most current popular data science fields. We do not focus only on applications but also on the basic building blocks, starting with Algorithmic Differentiation, the most crucial building block that in turn enables Deep Neural Networks. We follow this with chapters on Optimization and Regression, also used in building Deep Neural Networks. We then introduce Deep Neural Networks as well as topic modelling in Natural Language Processing (NLP), two advanced and currently very active fields in both industry and academia. Part III collects a range of case studies demonstrating how you can build a complete numerical application quickly from scratch using Owl. The cases presented include computer vision and recommender systems.

This book aims at anyone with a basic knowledge of functional programming and a desire to explore the world of scientific computing, whether to generally explore the field in the round, to build applications for particular topics, or to deep-dive into how numerical systems are constructed. It does not assume strict ordering in reading – readers can simply jump to the topic that interests them most.

Table of Contents

Frontmatter

Numerical Techniques

Frontmatter
Chapter 1. Introduction
Abstract
We begin by briefly introducing the two key topics of this book: scientific computing and functional programming. We then introduce the main tool we will use throughout this book, the Owl library. This chapter finishes with an outline of the whole book, our target audience, and how to use the book.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 2. Numerical Algorithms
Abstract
We begin our journey into the world of scientific computing. The most basic building block of any numerical computing library is a series of mathematical calculation functions. In this chapter, we introduce implementing two key mathematical techniques when working with functions: interpolation and integration. We present some key techniques, and show how they are implemented using OCaml, and specifically, how they are supported by mathematical functions provided in the Maths module in Owl.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 3. Statistics
Abstract
In this chapter we turn to another indispensable tool in data analysis: statistics. Statistical analysis enables us to gain insight from data, and we group the statistical functions in Owl into three topics: descriptive statistics, distributions, and hypothesis testing. In this chapter we introduce how these basic statistics are supported in OCaml.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 4. Linear Algebra
Abstract
Linear algebra is an important area of mathematics and has many applications in numerical computing. In this chapter we first present an overview, and then focus on how to use OCaml to solve concrete problems and develop a thorough understanding of the key concepts in linear algebra. This chapter covers the classic topics in linear algebra, including Gaussian elimination, vector space, determinants, eigenvalues and eigenvectors. Before discussing these topics, we provide a brief introduction to the matrix data structure supported in OCaml.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 5. N-Dimensional Arrays
Abstract
The N-dimensional array is the fundamental building block in numerical computing libraries e.g., NumPy and SciPy. It is the core dense data structure and many advanced numerical functions are built on top of it, including linear algebra, optimisation, and algorithmic differentiation. In fact, the rest of this book is built upon it. In this chapter, we introduce the Ndarray module and its core functions. We then pay particular attention to its two key and frequently used functionalities: slicing and broadcasting. Finally, we briefly explain the concept of a tensor, similar to but different from the N-dimensional array.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 6. Ordinary Differential Equations
Abstract
A differential equation is an equation that contains a function and one or more of its derivatives. They have been studied ever since the invention of calculus, driven by applications in fields including mechanics [1], astronomy [2], geometry [3], biology [4], engineering [5], economics [6], and many more. If the function and its derivatives in a differential equation concern only one variable,we call it an Ordinary Differential Equation (ODE), and it can model a one-dimensional dynamical system. If there are multiple variables involved, it is a Partial Differential Equation (PDE). In this chapter we focus on ODE, including definition, numerical solution, and the tools provide in Owl, etc.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 7. Signal Processing
Abstract
We rely on signals such as sound and images to convey information. Signal processing is the field concerned with analysing, generating, and transforming signals. Its applications can be found in a wide range of fields: audio processing, speech recognition, image processing, communication system, data science, etc. In this chapterwe focus on Fourier Transform, the fundamental tool in signal processing and modern numerical computing.We introduce its basic idea, and then demonstrate how to perform FFT in OCaml with examples and applications. We also cover the relationship between FFT, convolution, and filters.
Liang Wang, Jianxin Zhao, Richard Mortier

Advanced Data Analysis Techniques

Frontmatter
Chapter 8. Algorithmic Differentiation
Abstract
Differentiation is core to many scientific applications including maximising or minimising functions, solving systems of ODEs, and non-linear optimisation such as KKT optimality conditions. Algorithmic differentiation (AD) is a computerfriendly technique for performing differentiation that is both efficient and accurate. A recent important application of algorithmic differentiation is in machine learning and artificial intelligence. [1] Training a neural network involves two phases, namely forward and back propagation. The latter is essentially the calculation of the derivative of the whole neural network as a large function. [2] In this chapter, we will introduce this topic using an hands-on approach. Starting from the basic definition, we build up an simplified version of AD engine step by step. We then move to the AD engine in Owl to present its usage and more implementation details.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 9. Optimisation
Abstract
Optimisation is one of the most fundamental areas of numerical computing. From simple root finding to advanced machine learning, optimisation is everywhere. In this chapter, we will give you a brief overview of this topic and how the OCaml numerical library, Owl, supports basic optimisation methods.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 10. Regression
Abstract
Regression is an important topic in statistical modelling and machine learning. It’s about modelling problems which include one or more variables (also called “features” or “predictors”) and require us to make predictions of another variable (“output variable”) based on previous values of the predictors. Regression analysis includes a wide range of models, from linear regression to isotonic regression, each with different theoretical backgrounds and applications – explaining all these models is beyond the scope of this book. Here, we focus on several common forms of regression, particularly linear and logistic regression. We introduce their basic ideas, how they are supported in the numerical library, and how to use them to solve real problems.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 11. Neural Network
Abstract
Originally conceived as far back as the 1940s, neural networks have received a lot of attention in recent years due to frequently astounding results. In this chapter, we will demonstrate step by step how to build up a practical neural network module with what we have learned so far, especially algorithmic differentiation, optimisation, and regression. After that,we will further presentOwl’s neural network module, including its design and usage. After studying the basic feedforward neural network, we will introduce some more advanced types of neural networks, including the Convolutional Neural Network, the Recurrent Neural Network, and Generative Adversarial Network.
The original idea of a “neural network” comes from attempts by computer scientists to model how biological neural systems work. The signal processing neurons carry out is modelled as computation, and the complex transmission and triggering of impulses are simplified to activations. In this chapter we start with the simplest neuron, the perceptron, and then go on to introduce how neural network can be built using regression as previously introduced in Chap. 10.We then proceed to introduce the neural network module from Owl, and three popular types of neural network it supports: convolutional neural network, recurrent neural network, and generative adversarial network. We will give each a quick introduction before conclude this chapter.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 12. Vector Space Modelling
Abstract
Dominant media types on the Internet include images, audio, video, and text. Many day-to-day tasks involve analysis of text and Natural Language Processing (NLP) is a powerful tool for extracting insights from text corpora. NLP is a very large topic with many interesting challenges. In this chapter we focus on information retrieval and specifically topic modelling.
Liang Wang, Jianxin Zhao, Richard Mortier

Use Cases

Frontmatter
Chapter 13. Case Study: Image Recognition
Abstract
Howcan a computer take an image and answer questions like “is there a cat or a dog in this picture?” In recent years the machine learning community has made tremendous progress in tackling this problem. In particular, Deep Neural Networks (DNN) are able to achieve extraordinary performance on visual recognition tasks, matching or even exceeding human performance in some domains. Having already introduced neural networks in the previous chapter, we will now discuss one specific use case built on that module: using the InceptionV3 architecture to perform image classification.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 14. Case Study: Instance Segmentation
Abstract
Computer vision is a field that automates tasks such as ascribing highlevel descriptions to images and videos. It has been applied to a wide variety of domains ranging from the highly technical such as automatic tagging of satellite images or analysis of medical imaging [1] [2], to the more mundane such as categorising pictures in your phone or creating an emoji from a picture of your face. This field has seen tremendous progress since 2012, when A. Krizhevsky et al. first used deep learning in computer vision and crushed their opponents in the ImageNet challenge [3]. We have already discussed the image recognition task in the previous chapter. Here we introduce another classical computer vision task, Instance Segmentation, which labels objects within an image. We will discuss its connection with other similar applications, how the deep neural networks are constructed in OCaml, and how such a network when loaded with pre-trained weights, can be used to process users’ input images.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 15. Case Study: Neural Style Transfer
Abstract
Neural Style Transfer (NST) is an exciting DNN-based application that creates arts. In this chapter we introduce this application in detail: its theory, the implementation, and examples of its use. NST is has been extended in many ways, one of which is the fast style transfer, and we also introduce how this application works with examples.
Liang Wang, Jianxin Zhao, Richard Mortier
Chapter 16. Case Study: Recommender System
Abstract
In this chapter, we will build a search engine Sofia using latent semantic analysis. Sofia is a content-based filtering system that captures the semantics contained in the unstructured web text. Sofia looks for similar articles in a text corpus when a query document arrives. You will find that much of the basic theory has been covered in Chap. 12. In this chapter, we focus on introducing the full workflow of this system, and how to address a series of technical challenges in building Sofia.
Liang Wang, Jianxin Zhao, Richard Mortier
Backmatter
Metadata
Title
OCaml Scientific Computing
Authors
Liang Wang
Dr. Jianxin Zhao
Prof. Richard Mortier
Copyright Year
2022
Electronic ISBN
978-3-030-97645-3
Print ISBN
978-3-030-97644-6
DOI
https://doi.org/10.1007/978-3-030-97645-3

Premium Partner