Skip to main content
Top

2024 | Book

Fundamental Mathematical Concepts for Machine Learning in Science

insite
SEARCH

About this book

This book is for individuals with a scientific background who aspire to apply machine learning within various natural science disciplines—such as physics, chemistry, biology, medicine, psychology and many more. It elucidates core mathematical concepts in an accessible and straightforward manner, maintaining rigorous mathematical integrity. For readers more versed in mathematics, the book includes advanced sections that are not prerequisites for the initial reading. It ensures concepts are clearly defined and theorems are proven where it's pertinent. Machine learning transcends the mere implementation and training of algorithms; it encompasses the broader challenges of constructing robust datasets, model validation, addressing imbalanced datasets, and fine-tuning hyperparameters. These topics are thoroughly examined within the text, along with the theoretical foundations underlying these methods. Rather than concentrating on particular algorithms this book focuses on the comprehensive concepts and theories essential for their application. It stands as an indispensable resource for any scientist keen on integrating machine learning effectively into their research.

Numerous texts delve into the technical execution of machine learning algorithms, often overlooking the foundational concepts vital for fully grasping these methods. This leads to a gap in using these algorithms effectively across diverse disciplines. For instance, a firm grasp of calculus is imperative to comprehend the training processes of algorithms and neural networks, while linear algebra is essential for the application and efficient training of various algorithms, including neural networks. Absent a solid mathematical base, machine learning applications may be, at best, cursory, or at worst, fundamentally flawed. This book lays the foundation for a comprehensive understanding of machine learning algorithms and approaches.

Table of Contents

Frontmatter
1. Introduction
Abstract
This preliminary chapter serves as the introduction to the book. I begin by delineating the goal: bridging the gap in machine learning literature for natural science students who may not possess an extensive computer science background. In this chapter, I outline the book’s structure, and emphasise that its chapters are crafted to be self-contained, enabling readers to focus on topics of specific interest. I discuss the prerequisites for understanding the content and highlight the importance of a fundamental grasp of mathematics at the undergraduate level. Furthermore, I point out optional and advanced material marked for those who wish to delve deeper.
Umberto Michelucci
2. Machine Learning: History and Terminology
Abstract
This chapter traces the evolution of machine learning (ML), from its early theoretical foundations to its contemporary applications in various scientific disciplines. Starting with Alan Turing’s seminal work and the development of the perceptron in the mid- 20th century, We explore briefly key milestones like the introduction of neural networks, the impact of Minsky and Papert’s book, and the resurgence of ML in the late 20th century with advancements in algorithms and computational power. The chapter highlights the transition from theoretical research to practical applications, marked by significant developments such as backpropagation, decision trees, support vector machines, and reinforcement learning. The role of ML in fields such as chemistry, physics, and biology is discussed, emphasising its transformative impact on drug discovery, high-energy physics, astrophysics, genomics, and proteomics. The chapter concludes by categorising ML into various types: supervised, unsupervised, semi-supervised, and reinforcement learning, each of which plays a distinct role in advancing scientific knowledge.
Umberto Michelucci
3. Calculus and Optimisation for Machine Learning
Abstract
This chapter delves into the fundamental concepts of calculus and optimisation related to machine learning, offering both theoretical insights and practical usecases. Starting with the motivation behind using calculus in machine learning, the chapter systematically introduces the concept of limit, which lays the foundation for understanding derivatives and their properties. The discussion on derivatives extends to their role in partial differentiation and gradients, both crucial for optimising machine learning algorithms. A significant portion of the chapter is dedicated to optimisation techniques specifically tailored for neural networks. The chapter begins with an overview of learning definitions and their implications for neural network training. This is followed by an examination of constrained versus unconstrained optimisation and an exploration of the complexities in identifying absolute and local minima of functions. The latter part of the chapter is focused on various optimisation algorithms, briefly discussing line search and trust region methods, then transitioning into specific approaches like steepest descent and gradient descent. The importance of selecting an appropriate learning rate is discussed, along with variations of gradient descent (GD) and strategies for choosing the right mini-batch size. The chapter concludes by exploring the connection between Stochastic Gradient Descent (SGD), fractals, and their implications in machine learning, providing a fascinating example of how complexity appear from very easy optimisation problems.
Umberto Michelucci
4. Linear Algebra
Abstract
This chapter provides an essential introduction to linear algebra, tailored to improve understanding of its importance in machine learning. It begins by elucidating the fundamental concepts of vectors and matrices, essential building blocks, and delves into their various operations such as addition, subtraction, multiplication, and more advanced procedures such as dot and cross products. The discussion then extends to the critical notions of eigenvectors and eigenvalues, highlighting their significance in matrix analysis and optimisation problems prevalent in machine learning. Central to the chapter is the exploration of Principal Component Analysis (PCA), a key technique in dimensionality reduction, which uses eigenvectors and eigenvalues for dimensionality reduction. The chapter emphasises the practical application of these linear algebraic concepts in deep learning, illustrating their indispensability in algorithm design and model optimisation.
Umberto Michelucci
5. Statistics and Probability for Machine Learning
Abstract
This chapter delves into the critical role of statistics and probability in machine learning, starting with an overview of random experiments and variables. It progresses to cover essential topics such as set theory, probability, conditional probability, and key theorems such as the Bayes Theorem and the Central Limit Theorem. The discussion extends to distribution functions, the significance of expected values and variance, and the normal distribution, along with a variety of other distributions relevant to machine learning. The chapter also introduces Moment Generating Functions as vital tools in probability theory, providing a foundation for understanding the Central Limit Theorem’s implications for data analysis and prediction in machine learning. By exploring these statistical concepts, the chapter aims to equip readers with the necessary knowledge to effectively engage with machine learning models and understand the statistical underpinnings of algorithm performance and data analysis.
Umberto Michelucci
6. Sampling Theory (a.k.a. Creating a Dataset Properly)
Abstract
This chapter dives into sampling theory, which is very important if you are working on anything from digital technology to health sciences. Think of it as choosing the best pie slices that give you a real taste of the whole thing. This helps you make solid guesses and build or test machine learning models that actually work according to what you need to find out. Sampling theory deals with the challenge of creating representative subsets of a larger population. First, we discuss why it is critical to have clear research questions and hypotheses before you start collecting data. Getting the right ones helps figure out exactly which data you need to prove or disprove your point. Then, we break down survey sampling into two types: non-probability and probability sampling. Each type has its own way of picking data samples and is used for different reasons depending on what you are trying to study. We also discuss how to group your data (stratification and clustering) and look at different ways to pick random samples, either by making sure that each piece of data gets chosen only once or by possibly choosing the same piece more than once (with and without replacement).
Umberto Michelucci
7. Model Validation and Selection
Abstract
This chapter addresses the fundamental aspects of model validation and selection in the field of machine learning. It begins by discussing the concept of model validation, emphasising its critical role in assessing a model’s ability to generalise to new, unseen data. The phenomena of overfitting and underfitting are explored, along with an in-depth discussion on the bias-variance trade-off. The chapter then discuss various cross-validation techniques, including the hold-out approach, k-fold cross-validation, the leave-one-out method, and Monte Carlo cross-validation, each catering to specific scenarios and data characteristics. The selection of appropriate validation methods is discussed in the context of supervised and unsupervised learning models. Furthermore, the chapter delves into the complexities of model selection, highlighting the importance of balancing quantitative measures with qualitative criteria such as interpretability, domain relevance, and ethical considerations. This comprehensive exploration aims to equip practitioners with the knowledge and tools necessary for effective model evaluation and selection, ensuring the development of robust, reliable, and ethically sound machine learning models.
Umberto Michelucci
8. Unbalanced Datasets and Machine Learning Metrics
Abstract
This chapter explores the challenges and solutions associated with unbalanced datasets in machine learning. It begins by defining what constitutes an unbalanced dataset and emphasises their prevalence. The chapter introduces the concept of machine learning metrics, vital for evaluating the performance of models trained on such datasets. It presents a simple example to illustrate how traditional models fail in the face of extreme class imbalances and how this leads to misleading accuracy metrics. The core of this chapter delves into various approaches to address dataset imbalance, focusing primarily on data level approaches like oversampling and undersampling. It discusses the advantages and disadvantages of these techniques and highlights their practical applications through examples. The Synthetic Minority Oversampling Technique (SMOTE) is introduced as a sophisticated method to generate synthetic samples to balance datasets. Moreover, the chapter covers crucial metrics for assessing model performance in the context of unbalanced datasets, including the confusion matrix, sensitivity, specificity, precision, Fβ-score, and balanced accuracy. The discussion extends with the Receiving Operating Characteristic (ROC) curve and the Area Under the Curve (AUC), providing a comprehensive framework for evaluating and enhancing model performance in situations of class imbalance.
Umberto Michelucci
9. Hyper-parameter Tuning
Abstract
Hyper-parameters can be loosely defined as those parameters that are not changed during the training process. For example, number of layers in a FFNN, number of neurons in each layer, activation functions, learning rate, and so on. In this chapter I will discuss the problem of finding the best hyper-parameters to get the best results from your models. Doing this is called hyper-parameter tuning. I will first describe what a black-box optimisation problem is and how those classes of problems relate to hyper-parameter tuning. I will discuss the two most known methods to tackle these kind of problems: grid search and random search. You will understand, with examples, which one works under which conditions, and a few tricks that are very helpful, such as coarse to fine optimisation and sampling on a logarithmic scale. At the end of the chapter, you should know what hyper-parameter tuning is, why it is important, and how it works.
Umberto Michelucci
10. Feature Importance and Selection
Abstract
This chapter offers an in-depth exploration of various methods used to assess feature importance in machine learning models. Initially, it highlights the importance of identifying key features in a machine learning model, using examples from the healthcare and finance sectors to illustrate its significance. The chapter categorises feature importance assessment methods into three types: filter, wrapper, and embedded methods. Filter methods are preprocessing steps that select features independently of the model, with examples that include the variance threshold, the Chi-square test, the information gain, and the correlation coefficient. These methods are notable for their computational efficiency and scalability. The section on wrapper methods delves into techniques like recursive feature elimination, forward feature selection, backward feature elimination, exhaustive feature selection, and information content elimination. These methods are flexible and consider interactions between features, which can lead to better model performance but at the cost of higher computational intensity. Embedded methods, such as LASSO, ridge regression, elastic net, and decision trees, are integrated into the model training process and specific to each model, making them somewhat challenging to interpret in scientific contexts. The chapter also provides a detailed explanation of forward feature selection, backward feature elimination, information content elimination, and permutation feature importance.
Umberto Michelucci
Backmatter
Metadata
Title
Fundamental Mathematical Concepts for Machine Learning in Science
Author
Umberto Michelucci
Copyright Year
2024
Electronic ISBN
978-3-031-56431-4
Print ISBN
978-3-031-56430-7
DOI
https://doi.org/10.1007/978-3-031-56431-4

Premium Partner