2.3 Traditional machine learning methods
Several commonly used machine learning methods are used in this study to evaluate the effectiveness of different methods in predicting BP using features extracted from PPG and ECG signals and physical characteristics.
LASSO (least absolute shrinkage and selection operator) is a linear model with L1 prior as a regularizer (Friedman et al.
2010). As a large number of features are used to predict BP, it is important to add a regularization term in linear models to help with the variable selection. LASSO is able to perform both variable selection and regularization, leading to increase of prediction accuracy. The amount of regularization is controlled by
α, the coefficient of the L1 term, and it can be determined experimentally using cross-validation during the training process. In this study, fivefold cross-validation is used to select
α.
Support Vector Regression (SVR) is a popular machine learning model and has been proven to be an effective tool in real-value function estimation (Drucker et al.
1996). SVR uses a symmetrical loss function and errors with absolute values that are smaller than a certain threshold are ignored. As a result, the model produced by SVR depends only on a subset of the training data. A fivefold cross-validated grid-search is used to search for the optimal values for several important parameters, including kernel type (linear, polynomial, radial basis function), kernel coefficient (0.1, 0.01, 0.001, 0.0001), regularization parameter (1, 0.1, 0.01, 0.001, 0.0001) and epsilon-tube (0.1, 1, 5, 10, 20) which specifies the tolerance level.
AdaBoost, which is short for Adaptive Boosting, is an ensemble method and can be used to fit a sequence of weak learners (other types of learning algorithms) to improve performance (Drucker
1997). The final output is a combination of a weighted sum of predictions generated by these weak learners. A commonly used weak learner, a decision tree regressor is adopted in this study. A fivefold cross-validated grid-search is further used to search for the optimal values of the number of iterations (5, 50, 500), learning rate (1, 0.1, 0.01, 0.001, 0.0001) and loss function (linear, square, exponential).
Random forest (RF) is another ensemble method that constructs a number of decision trees built from samples drawn with replacement (Breiman
2001). With the added randomness, random forest can decrease the variance of the forest estimator. A fivefold cross-validated grid-search is used to search for the optimal values of several important parameters, namely the number of trees (100, 150, 200, 500, 1000), the criterion to measure the quality of a split (mean squared error, mean absolute error) and the minimum number of samples required to split an internal node (2, 3, 4, 5, 10).
K-Nearest Neighbours (KNN) is a non-parametric method that calculates the predicted value by taking weighted average values of k nearest neighbours. K is an integer value that needs to be specified, as well as weighting scheme and distance metric. In this study, a fivefold cross-validated grid-search is used to search for the optimal values of k (1, 5, 10, 15, 20), weighting scheme (uniform, distance) and distance metric (Euclidean, Manhattan).
Multi-layer Perceptron (MLP) is a typical class of feedforward neural network and it has the capability to learn non-linear models. It consists of at least three layers, including input, hidden and output layers. A fivefold cross-validated grid-search is used to search for the optimal values of several important parameters, namely number of hidden layer (1, 2, 3), number of nodes in the hidden layers (5, 10, 20, 50), activation function in the hidden layer (logistic sigmoid, hyperbolic tangent, ReLU), coefficient for the L2 regularization term (1, 0.1, 0.01, 0.001, 0.0001) and maximum number of iterations (100, 200, 500, 1000).
2.4 Proposed deep learning model
This study proposes a novel deep learning model to utilize the information contained in the PPG and ECG along with physical characteristics to predict BP. In contrast to the methods mentioned earlier, which require pre-processing and feature extraction from the PPG and ECG, deep learning models can take directly the raw signal data as input and the feature learning is essentially embedded in the modelling process. This novel hybrid deep learning model consists of various types of neural network models, such as Convolutional neural network (CNN), Long short-term memory (LSTM) and fully connected layer (Dense). The Dense layer is essentially a hidden layer in the MLP.
CNN was initially developed for image classification problems, where it receives two-dimensional image pixels as input and generates output after a series of operations that involve pattern learning. Multiple CNN layers are often applied in problems like this so that simple patterns can first be identified in the lower layers and be used to form more complex patterns within higher layers (Krizhevsky et al.
2012). The same process can be applied to one-dimensional time series data, such as the PPG and ECG in this study. One-dimensional CNN (1D CNN) can automatically learn to extract useful features from these signals and how to construct appropriate models to predict BP.
1D CNN applies the convolution operation on the input data with a number of filters (also called feature detector) (LeCun and Bengio
1995). The length of these filters can be specified and it is often referred to as kernel size. These filters are then moved along the signals and the shift size is referred to as strides, which is often chosen to be 1. Different types of padding can be applied to determine the size of the output. Zero-padding is often found to perform well in practice (Krizhevsky et al.
2012), and it is also adopted in this study. An activation function is often applied to the results generated from the convolution operation. ReLU is very popular and found to perform well in practice (Jarrett et al.
2009). Convolutional layers are often followed by dropout layers for regularization, and then pooling layers, such as max pooling and average pooling (Krizhevsky et al.
2012; Srivastava et al.
2014). CNN models tend to learn very quickly and the dropout layer can help slow down the learning process and result in a potentially better final model. The pooling layers can help reduce the dimension and consolidate learned features to the most essential elements. Pool length of 2 is often used in practice and it is also adopted in this study. Several convolutional layers can be stacked together to extract more complicated features. Hyperparameters that need to be determined for 1D CNN layers include kernel size (3, 5, 7, 9), number of filters (64, 128, 256, 512) and number of epochs (20, 50, 100). In this study, the range of kernel sizes, number of filters and epochs is investigated using a cross-validation process in which an optimum is selected based on accuracy and convergence time.
LSTM network model is a special type of recurrent neural network (RNN) that is able to learn long-term dependencies (Hochreiter and Schmidhuber
1997). It has been proven to be effective for sequence prediction tasks such as speech recognition, natural language processing and machine translation (Chen et al.
2017; Cui et al.
2016; Tian et al.
2017).
A typical memory block in LSTM contains a memory cell and three gates, namely, input, output and forget gates. The activation functions associated with the gates are often logistic sigmoid function. LSTM can support multiple parallel sequences of input data, such as the PPG and ECG signals in this study. LSTM can be used to automatically learn temporal dependencies in raw PPG and ECG signals and use them to predict BP values (Su et al.
2018). The parameter needs to be chosen for LSTM is the length of state vector (10, 50, 100).
CNN and LSTM are two types of deep learning structures that can be used separately to automatically learn from raw PPG and ECG signals to predict BP. They can also be stacked together in a way that the output from CNN is fed to the following LSTM layer. This stacked structure can be used to extract useful features and then learn the long-term temporal dependencies from the raw signals. This type of structure has been used for tasks such as detection of diabetes (Goutham et al.
2018), human activity recognition (Ordóñez and Roggen
2016), continuous cardiac monitoring (Saadatnejad et al.
2020), atrial fibrillation detection (Gotlibovych et al.
2018) and classification of myocardial infarction (Baloglu et al.
2019), and it is often found to perform well in practice.
In addition to the raw signals, this study investigates a novel deep learning structure that can also utilize useful information contained in physical characteristics to predict BP. This novel model consists of various types of models, including CNN, LSTM and Dense. This new structure can directly take raw signals and physical characteristics as input at the same time. It can learn to automatically pick up useful information contained in different types of input data and find an optimal way to link to BP.