RF-BiLSTM Neural Network Incorporating Attention Mechanism for Online Ride-Hailing Demand Forecasting

Zhao, Xiangmo; Sun, Kang; Gong, Siyuan; Wu, Xia

doi:10.3390/sym15030670

Open AccessArticle

RF-BiLSTM Neural Network Incorporating Attention Mechanism for Online Ride-Hailing Demand Forecasting

by

Xiangmo Zhao

,

Kang Sun

^*

,

Siyuan Gong

and

Xia Wu

School of Information Engineering, Chang’an University, Xi’an 710064, China

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(3), 670; https://doi.org/10.3390/sym15030670

Submission received: 13 February 2023 / Revised: 1 March 2023 / Accepted: 4 March 2023 / Published: 7 March 2023

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

Accurately predicting online ride-hailing demand can help operators allocate vehicle resources on demand, avoid idle time, and improve traffic conditions. However, due to the randomness and complexity of online ride-hailing demand data, which are affected by many factors and mostly time-series in nature, it is difficult to forecast accurately and effectively based on traditional forecasting models. Therefore, this study proposes an online ride-hailing demand forecasting model based on the attention mechanism of a random forest (RF) combined with a symmetric bidirectional long short-term memory (BiLSTM) neural network (Att-RF-BiLSTM). The model optimizes the inputs and can use past and future data to forecast, improving the forecasting precision of online ride-hailing demand. The model utilizes a random forest to filter and optimize the input variables to reduce the neural network complexity, and then an attention mechanism was incorporated into the BiLSTM neural network to construct a demand forecasting model and validate it using actual Uber pickup data from New York City. Compared with other forecasting models (Att-XGBoost-BiLSTM, Att-BiLSTM, and pure LSTM), the results show that the proposed symmetrical Att-RF-BiLSTM online ride-hailing demand forecasting model has a higher forecasting precision and fitting degree, which indicates that the proposed model can be satisfactorily applied to the area of online ride-hailing demand.

Keywords:

online ride-hailing; random forest; attention mechanism; BiLSTM; demand forecasting model

1. Introduction

As an important part of the intelligent transportation system (ITS), online ride-hailing platforms, such as Uber and Lyft, are rapidly gaining popularity among transportation users and have become one of the main travel modes for a wide range of passengers in recent years. For example, Uber alone provides more than 14 million trips a day in more than 700 cities all over the world [1]. With about 20% of adults using online taxi services in major cities, the demand for online ride-hailing has increased significantly [2]. In addition, with today’s TOD-based urban planning, passengers in the planning area are considered to be in nonmotorized vehicles, which are also considered to be regarded as a wide range of potential users of online vehicles [3].

Drivers of traditional hailing services must drive their empty taxis around the city with a view to searching for their next passenger. Meanwhile, in some parts of the city, the passengers may have to spend a long time searching for a taxi. This traditional hailing service mode in cities suffers from inefficiencies due to uncoordinated actions and difficulty in adapting to changing customer needs [4]. To effectively manage the scheduling of ride-hailing services, it is necessary to accurately forecast the demand for ride-hailing in order to reasonably dispatch vehicles for passengers so as to maximize profits while providing quality services [5].

By recommending potential passenger locations to drivers, it can decrease their empty cruising [6,7] and also help passengers shorten their waiting time. In addition, decreasing the time and distance of empty taxi trips can effectively reduce congestion and air pollution [8]. Due to its importance, the issue of ride-hailing demand forecasting has received a lot of attention.

Over the past few years, research on the trip demand forecasting of taxis and ride-hailing services has attracted a large number of researchers’ attention. The early research on trip demand forecasting mainly focuses on mathematical modeling methods and statistical learning methods. In terms of mathematical modeling methods, Li et al. [9] proposed an improved autoregressive integrated moving average (ARIMA)-based prediction method to forecast the spatial–temporal variation of passengers in a hotspot. In this work, historical univariate time-series data on taxi demand were used to try to capture temporal periodicities with short-term patterns. Moreira-Matias et al. [10] developed a combination of ARIMA and time-varying Poisson models to predict the spatial distribution of taxi riders over short-term time horizons using streaming data. To enhance the generalization performance of the model, researchers have tried to apply statistical learning methods to taxi demand prediction, such as extreme gradient boosting (XGBoost), support vector machines (SVMs), and random forests (RFs). Gong et al. [11] presented a machine learning-based model, XGBoost, to forecast New York City (NYC) taxi demand. Faghih et al. [12] presented a novel modeling method for capturing demand for e-hailing services, specifically Uber demand, in Manhattan, New York City. In order to understand demand both spatially and temporally, this approach uses two spatiotemporal models, least absolute shrinkage and selection operator applied on spatial–temporal autoregressive (LASSO-STAR) and STAR. Liu et al. [13] utilized the LASSO method to extract important features and, based on these features by using RF and SVM, to perform short-term ride-hailing demand forecasting. Through iterative optimization of statistical methods, Chang et al. [14] developed a high-precision forecasting model, wavelet-deep Gaussian process regression (DGPR), to forecast the probability distribution of ride-hailing demand. This model was implemented with wavelet decomposition and the DGPR method to reduce the prediction difficulty and take into account the uncertainty in the prediction process, respectively. It can be found that mathematical modeling and statistical learning-based methods are mainly applied to taxi trip demand forecasting, while relatively few studies are oriented to online ride-hailing demand forecasting.

In recent years, deep learning models have been widely used in online ride-hailing or taxi demand forecasting due to their excellent performance in prediction. Xu et al. [15] developed a long short-term memory (LSTM) model to forecast the future taxi demand of each region of a city by the recent requests and other relevant information. The authors selected the New York City taxi requests dataset for model evaluation, divided the city into small zones, and performed demand forecasting for each zone separately. Chen et al. [16] proposed an improved deep learning convolutional neural network (CNN) called Ubernet for short-term forecasting of online ride-hailing services demand. It used a multivariate framework to explain the ride-hailing services demand using some temporal and spatial characteristics found in the literature. By reasonably introducing CNN, Ara and Hashemi [17] developed a deep learning-based model, CNN-Bidirectional LSTM (BiLSTM), for ride-hailing demand forecasting and travel demand forecasting between city neighborhood zones. This prediction model combines two types of networks, convolutional and recurrent neural networks, to predict the demand for each pickup–destination pair. To achieve accurate online ride-hailing demand forecasting in different areas of a city, an LSTM prediction model combined with the attention mechanism (LSTM + attention) was constructed by Ye et al. [18] based on the extraction of temporal features, spatial features, and weather features. To improve the flexibility of multimodal demand forecasting methods, Liang et al. [19] proposed a novel graph neural network, namely, the multirelational spatiotemporal graph neural network (ST-MRGNN) to forecast online car-hailing demand for multimodal systems. The results show that the model has a good demand forecasting performance for both subway and ride-hailing. Wu et al. [20] presented a multiview deep spatiotemporal network (MVDSTN) framework to achieve an effective representation of the spatiotemporal relationships required for online ride-hailing demand forecasting. This study also employs the LSTM network incorporating the attention mechanism to construct the prediction model. To address the inherent limitations of grid-based methods, Ara and Hashemi [21] proposed a novel neural network architecture integrating autoencoder and convolutional neural networks to best extract the spatiotemporal correlations of features. Considering the uncertainty in online ride-hailing demand forecasting, Liu et al. [22] constructed a convolutional LSTM model introducing a hexagonal convolution operation (H-ConvLSTM) to analyze the effect of spatiotemporal granularity on the accuracy of online ride-hailing demand forecasting. Huang et al. [23] proposed a novel deep learning model dynamic multigraph convolutional network with the generative adversarial network (DMGC-GAN) to investigate the origin–destination (OD)-based ride-hailing demand forecasting problem. It effectively applies the temporal multigraph convolutional network (TMGCN) layer and GAN network to explore the dynamic spatial and temporal correlation among OD demands. Zhang et al. [24] proposed a new deep learning architecture, namely, the locally connected spatial–temporal fully convolutional neural network (LC-ST-FCN) to simultaneously learn spatial–temporal correlations and local statistical differences among regions. Li et al. [25] proposed three hybrid deep learning models based on CNN, LSTM, GRU, BiLSTM, and ConvLSTM models considering multifactor features to forecast online ride-hailing demand. The results show that the proposed hybrid models can effectively combine the advantages of individual models and has better performance in various module combinations.

Online ride-hailing demand is usually related to spatial features, temporal features, weather features, and other features of the dataset. Most of the existing studies have been conducted from a single perspective of temporal and spatial features of online ride-hailing demand, and few of them have considered including weather features and holiday features together for analysis. The importance of different influencing features for online ride-hailing demand forecasting is different, but the existing studies are mainly based on all features of a given dataset as input for direct training to build the corresponding neural network models. Based on this approach, although it is simple to operate, it may cause data redundancy for data sets with a large number of features due to excessive input data. In addition, more input features can significantly increase the model training time and bring more computational load. To overcome the above shortcomings, this study took publicly available historical data on online ride-hailing demand as an example and determined the feature input quantities and the predicted output quantity in the dataset to construct a novel online ride-hailing demand forecasting model. To solve the above problems, this study proposes an online ride-hailing demand forecasting model combining a random forest algorithm and BiLSTM incorporating attention mechanism (Att-RF-BiLSTM) and uses the data of Uber pickups in major boroughs of NYC from January 2015 to June 2015 as the experimental data for online ride-hailing demand forecasting. The key features with high correlation with the online ride-hailing demand are firstly screened out as inputs using the random forest algorithm, and then the BiLSTM neural network with symmetry is combined with the attention mechanism to build the Att-RF-BiLSTM online ride-hailing demand forecasting model. The experimental results show that the proposed forecasting model has the best fitting effect and the best forecasting accuracy compared to the comparative models.

The main contributions of this study are as follows:

The key features required for model training are filtered by the random forest algorithm, which effectively reduces model input and improves model training efficiency while ensuring model prediction accuracy.
Based on the historical data of online ride-hailing, an online ride-hailing demand forecasting model, namely, Att-RF-BiLSTM, was constructed, which has the best fitting effect and forecasting accuracy compared to other models.

The rest of the paper is organized as follows: Section 2 delves into the specific theory of each neural network model used in this study. Section 3 introduces the Att-RF-BiLSTM model in detail in terms of data preparation, model details and model evaluation indicators, and parameter selections. Section 4 explains and presents the forecasting experiments results of the proposed forecasting model and other comparative forecasting models. Finally, Section 5 gives the conclusions of this work.

2. Neural Network Modeling Theories

2.1. LSTM Neural Network

Recurrent neural networks (RNNs) can mine and analyze the time-series information in the data, so they are widely used in processing data with sequential characteristics [26]. An RNN has the features of a time feedback loop and stored memory data, which can be better applied in time-series data analysis. As shown in Figure 1, an RNN can record data information at each moment, and the input layer at each moment and the hidden layer at the previous moment jointly determine the hidden layer at the current moment; thus, it is superior in solving time-series problems. In addition, it has different results for input sequences of a different order, which is good and sensitive for processing sequence data.

Figure 2 shows the neural unit structure of an RNN. Since the neural unit of an RNN uses only a single function, this function will be superimposed many times during training, which is easy to cause the result of gradient explosion and gradient disappearance [27], so an RNN is not very effective in dealing with sequence data with a large time spanning.

An LSTM neural network [28] is an advanced version of an RNN that can filter and store the required information, thus effectively avoiding the problem of gradient disappearance. An LSTM can filter information and selectively forget or retain input information due to its three gating systems: input gate, output gate, and forgetting gate. Compared with the repetitive neural network module processing layer with only a single tanh layer in an RNN, an LSTM has a more complex four-layer architecture repetitive module [29]. An LSTM network is composed of multiple hidden cells connected with the same structure. A hidden cell consists of four parts: cell status, forgetting gate

f_{t}

, input gate

i_{t}

, and output gate

o_{t}

. Figure 3 depicts the basic structure of an LSTM hidden cell unit.

In comparison to the RNN, an LSTM network has added cell states with three gate components, as shown in Figure 3. The LSTM structure’s forgetting gate

f_{t}

is in charge of eliminating the unnecessary data from the previous LSTM cell that need to be filtered out, which is calculated as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(1)

In Equation (1),

f_{t}

represents the output of the forgetting gate;

σ (\cdot)

represents the sigmoid activation function that locks the output to 0~1;

W_{f}

represents the weight matrix of the forgetting gate;

x_{t}

denotes the new input information;

h_{t - 1}

denotes the output of the original LSTM cell structure; and

b_{f}

denotes the deviation vector of the forgetting gate.

The input gate

i_{t}

carries out the input of the next cell through the new input data and the data filtered by the forgetting gate

f_{t}

and is calculated as follows:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(2)

\tilde{C_{t}} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(3)

In Equations (2) and (3),

i_{t}

is the information that needs to be preserved for the new cell;

\tilde{C_{t}}

is information for the original cell state that needs to be preserved;

t a n h (\cdot)

is the tangent function, which can compress the result to −1~1;

W_{i}

and

W_{c}

are the weighting matrices of the input gate and the cell state, respectively; and

b_{i}

and

b_{c}

are the deviation vectors of the input gate and the cell state.

After determining the useful information of the original cell (i.e.,

f_{t} * C_{t - 1}

) and the retained information of the new cell (i.e.,

i_{t} * \tilde{C_{t}}

), update the cell state

C_{t}

as shown in Equation (4).

C_{t} = f_{t} * C_{t - 1} + i_{t} * \tilde{C_{t}}

(4)

The output gate

o_{t}

outputs the current status information and is calculated as follows:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(5)

h_{t} = o_{t} * t a n h (C_{t})

(6)

In Equations (5) and (6),

W_{o}

is the weighting matrix of the output gate;

b_{o}

is the deviation vector of the output gate; and

o_{t}

is the current cell’s output.

The specific expressions of the

σ (\cdot)

function and

t a n h (\cdot)

function in the above model are as follows:

σ (x) = \frac{1}{1 + e^{- x}}

(7)

t a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(8)

2.2. BiLSTM Neural Network

A unidirectional LSTM has a better forecasting effect for data with long time-series; however, it can only make a prediction using previous data information. Thus, it only utilizes a number of historical data that are close to the prediction data and forgets the information of the earlier historical data. The basic idea of BiLSTM is that each training sequence is two LSTM neural networks, forward and backward, respectively, while both layers are linked to the input and output layers, and the output is obtained by combining the past (forward) and future (backward) layers [30]. Thus, the symmetric BiLSTM network architecture can use both past and future demand data of online ride-hailing to better simulate and forecast online ride-hailing demand.

The neurons of BiLSTM consist of the input layer, the LSTM layers (forward and backward), and the output layer. Figure 4 gives the specific details of the BiLSTM network structure, and it can be observed that the forward-LSTM layer and backward-LSTM layer are connected to the output layer.

In Figure 4,

\overset{\leftarrow}{H}

represents the backward sequential data input, and

\vec{H}

represents the forward sequential data input, and the data are all input into the LSTM cell structure;

x_{1} ~ x_{n}

represent the input vectors, and

y_{1} ~ y_{n}

denote the output prediction data.

2.3. Attention Mechanism

For a neural network model, the parameters and representation capacity of the model are proportional to the stored information. As a result, when the model stores a large amount of information, it suffers from information overload. In a large amount of input data, the attention mechanism can find the most important data information for the current task [31], reduce the importance of other data, and even filter some invalid information so as to effectively address the issue of information overload and can improve the efficiency and accuracy of the model. It can better learn relevant information to affect the prediction results since it concentrates on the input data’s more important information. In addition, the computation and storage of the model are not expanded, which effectively improves the model’s efficiency. The attention mechanism is mainly applied to the generation process of the hidden state matrix H in the recurrent neural network framework.

In this paper, the attention mechanism is integrated into the BiLSTM model to perform the weighted evaluation for all features to achieve selective use of the input data, which is suitable for forecasting ride-hailing pickup time-series data in this study [32]. Figure 5 gives the flow structure of the attention mechanism.

In Figure 5,

x_{i}

represents the input of the BiLSTM layer with the attention mechanism introduced;

h_{i}

represents the BiLSTM layer’s output;

α_{i}

denotes the various weights of the BiLSTM’s different channels as determined by calculations based on the attention mechanism; and

y

represents the neural network model’s final output.

The formulas for the calculation of the weighting coefficients for the attention mechanism can be obtained as follows:

u_{t} = t a n h (w \cdot h_{t} + b)

(9)

α_{t} = \frac{\exp (u_{t}^{T} u_{w})}{\sum_{t} \exp (u_{t}^{T} u_{w})}

(10)

p_{t} = \sum_{t} α_{t} \cdot h_{t}

(11)

where

u_{t}

is the importance of the output to the result at moment t;

u_{w}

is the initialized weight matrix; w is the weight coefficient; b is the deviation vector;

α_{t}

is the feature weight of

h_{t}

; and

p_{t}

is the output vector after

h_{t}

weighted summation. The larger the calculated weight

α_{t}

, the greater the importance of the hidden layer features at that moment and the greater the contribution of the vector

p_{t}

to the prediction result at that moment.

3. Att-RF-BiLSTM Neural Network Model Construction

3.1. Time-Series Data Preparation

3.1.1. Data Sources and Preprocessing

This paper uses the number of Uber pickups as a representation of the ride-hailing demand in a region and focuses on the forecasting of Uber pickup data. The New York City Uber Pickups publicly available dataset from January 2015 to June 2015 was selected as the data required for training and testing in this study, enriched with weather, borough, and holiday information. The dataset includes observations from seven main boroughs of NYC (i.e., Bronx, Brooklyn, EWR, Manhattan, Queens, NA, and Staten Island), and each borough has about 4343 pieces of data; therefore, the whole data contain a total of 4343 × 5 samples. As shown in Table 1, for different boroughs, all observations collected at the same time are the same except for pickups, which are different.

According to the information presented in Table 1, it is clear that the observations in this dataset are recorded and updated hourly, and the data elements of the collected data include the time period of the observations (pickup_dt), wind speed in miles/hour (spd), visibility in miles to the nearest tenth (vsb), the temperature in Fahrenheit (temp), dew point in Fahrenheit (dewp), sea level pressure (slp), 1/6/24 h liquid precipitation (pcp01, pcp06, and pcp24), snow depth in inches (sd), being a holiday (1) or not (0) (hday), and pickups (number of pickups for the period).

3.1.2. Data Preprocessing

(1) Data Cleaning and Integration

Data cleaning is a complex task, and the success of data cleaning determines the quality of the data, so data cleaning is an important guarantee for forecasting experiments. To start with, we needed to deal with the redundant and invalid data in the original data. For example, the “No” column is not relevant to the prediction data and can be removed directly. Then, since the main consideration of this study is the whole region of NYC, the data of each borough at the same time was integrated. From the analysis above, it is clear that the pickups in the observations at the same time are different, but the rest are consistent, so only the pickups of each borough were summed at the same pickup_dt, and the other parameters were unchanged. The whole observations for NYC after integration are shown in Table 2. It contains weather observations and holiday data from 1 January 2015 to 30 June 2015, and it has 4343 pieces of data; record the dataset for each eigenvalue 24 h a day starting from 1:00 on the first day and from 0:00 on the rest of the days.

(2) Trend analysis of data set changes

According to the pickup_dt and Uber pickups data from the dataset, the monthly variation of average Uber pickups from January 2015 to June 2015 can be drawn as shown in Figure 6. It can be observed that in June 2015, New York City had the greatest ride-hailing (Uber) demand (the highest average pickup of 3911.6), and the lowest was in January (the lowest average pickup of 2621.5). The average Uber pickups increased at a large growth rate (28.48%) from January 2015 to February 2015, then dropped off (−9.75%) in March 2015, and then increased at a slower growth rate (4.16%, 14.47%, and 7.92%) from April 2015 onwards.

(3) Weather Features Analysis

Different weather conditions will affect people’s choices of travel mode. Ye et al. [18] observed and analyzed the change in the number of orders of online ride-hailing over time during rainy and cloudy weather and concluded that weather conditions can have an impact on online ride-hailing demand. As mentioned in the above description, the dataset selected for this study contains multiple weather feature data (e.g., precipitation (i.e., pcp01, pcp06, and pcp24), snow depth (i.e., sd), and temperature (i.e., temp)) in order to further verify the existence of correlations between weather and online ride-hailing demand. We selected precipitation and snow depth feature quantities (i.e., pcp01, pcp06, pcp24, and sd) for statistical analysis. For precipitation and snow depth, which are both specific values, to qualitatively analyze the impact of both rain and snow weather conditions on online ride-hailing demand, the features, such as precipitation (i.e., pcp01, pcp06, and pcp24) and snow depth (i.e., sd), are divided into different categories where the values of precipitation data are low, so it is only divided into two categories, with rain and no rainfall, based on the values of pcp01, pcp06, pcp24 that are equal to 0 or greater than 0. The snow depth data cover a larger data range and are therefore classified into four categories based on the size of sd: no snowfall (i.e., sd = 0), light snow (i.e., 0 < sd <= 3.94In.), moderate snow (i.e., 3.94In. < sd <= 7.87In.), and heavy snow (i.e., sd > 7.87In.). The bar graphs and line graphs of the average Uber pickups under different categories of precipitation and snow depth after division are shown in Figure 7 and Figure 8, respectively.

According to Figure 7 it can be easily found that rainfall or not has a significant effect on Uber pickups and has a relationship with the duration of rainfall. The average Uber pickup is slightly higher when there is rainfall in 1 h than when there is no rainfall, while the average Uber pickup is significantly lower when there is rainfall in 6 h and 24 h than when there is no rainfall. This indicates that rainfall in the short term does not have a significant impact on passengers’ use of online ride-hailing travel, but prolonged rainfall can negatively affect road traffic conditions, which finally affects online ride-hailing demand. As shown in Figure 8, as snowfall levels increase, the average Uber pickup gradually decreases, and the average Uber pickup especially decreases significantly during heavy snowfall. Similarly, it indicates that snowfall also has an impact on road traffic conditions, which further affects online ride-hailing demand. Therefore, weather features were selected as input features in this study.

(4) Data Normalization

The order of magnitude differences in the data can make the larger magnitude eigenvalues have a large impact on the prediction and can lead to a slower convergence of the algorithm iterations [33]. Therefore, it is necessary to normalize the data in order to reduce the effect of different magnitudes on the prediction results. The normalization operation performs a linear transformation of the original data and maps the result to the range [0, 1] [34], which can significantly boost the model iteration convergence speed and prediction accuracy.

In this paper, the minimum–maximum normalization transformation was applied to all attributes, and the transformation method is shown in Equation (12).

x^{*} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(12)

In Equation (12),

x_{m a x}

and

x_{m i n}

are the maximum and minimum data in the column, respectively.

x^{*}

is the entire column’s value normalized by the highest value.

Figure 9 and Figure 10 show the box graphs of eigenvalues before and after normalization. It is clear from the figures that there is a large difference between the eigenvalue data of the original data, and after the data was normalized to the range [0, 1], an inverse normalization operation was required to restore the normalized eigenvalues to their original values.

3.2. Att-RF-BiLSTM Online Ride-Hailing Demand Forecasting Model

The Att-RF-BiLSTM neural network model, which is a combination of an attention mechanism based on a random forest and a BiLSTM neural network, is proposed for online ride-hailing demand forecasting. The first task is to normalize the data of the influencing features of online ride-hailing demand, that is, to preprocess the sample data of different orders of magnitude and units of magnitude; secondly, the next task is to use a random forest for feature importance measures and to filter out the influencing features with a higher degree of association with online ride-hailing demand as the input data, then to train the BiLSTM model with the attention mechanism using the training dataset, and finally to test the successfully trained Att-RF-BiLSTM model using the test dataset.

3.2.1. Random Forest-Based Key Feature Selection

To remove some redundant data and retain the features with high relevance to online ride-hailing demand forecasting, in this study, we used the random forest algorithm [35] to measure the significance of the features of the sample data and extract the key features with high relevance in the process of online ride-hailing demand forecasting so as to avoid the problem of data redundancy caused by too many input variables. This consisted of two steps as follows:

1. Calculation for the importance of the influencing feature X;

(1) The out-of-bag data (OOB) is composed of unextracted sample data, and the test set is composed of out-of-bag data; the extracted sample data are used to build a random forest model, and the performance of the model is assessed by calculating the out-of-bag data error, hereafter referred to as

e r r O O B 1

;

(2) Calculate the out-of-bag data error again, denoted as

e r r O O B 2

, by adding noise interference (i.e., randomly changing the sample value at feature X) to feature X for all samples of out-of-bag data OOB;

(3) Assuming

N

trees in the random forest, the following equation gives the importance measure of the influencing feature X:

W = \sum (e r r O O B 2 - e r r O O B 1) / N

(13)

2. The importance measure of each influencing feature X in the sample data can be obtained by the above operation, and then the key features with higher importance measures are selected from the total influencing features according to the corresponding rules. In this way, the key features that have a high correlation with online ride-hailing demand can be filtered out, and the problem of too many input variables in the model can be reduced to achieve better online ride-hailing demand with fewer input variables. The specific evaluation process is as follows:

(1) Calculate the importance of each influencing feature X according to Equation (13), and rank them in descending order.

(2) An importance threshold

λ

is determined based on the feature importance ranking results, and the features above threshold

λ

are retained as the key features, while features below threshold

λ

are rejected.

(3) The required

m

influencing features can be obtained according to the above steps.

Figure 11 presents the findings of the influencing feature importance ranking in the process of online ride-hailing demand forecasting using a random forest. From the figure, it can be seen that the correlation of pcp01, slp, sd, dewp, and spd on online ride-hailing demand is relatively higher among all the influencing features and is greater than the threshold value

λ

= 0.1. Therefore, the input variables are composed of pcp01, slp, sd, dewp, and spd (i.e., m = 5). Meanwhile, the sample data set consists of the feature sets of the input variables.

3.2.2. Att-RF-BiLSTM Neural Network Online Ride-Hailing Demand Forecasting Model

The execution flow chart of the proposed Att-RF-BiLSTM online ride-hailing forecasting model is shown in Figure 12. Firstly, the observed data of major boroughs in NYC were processed to extract Uber pickup observations and other observations. Secondly, feature measurements were performed, and key influencing features were filtered out using a random forest from numerous features and used as input data. Then, a forecasting model combining attention mechanism and BiLSTM was built, and it was trained using the updated key influencing features dataset. Finally, the best forecasting model was selected according to the model evaluation indicators to obtain more accurate forecasting values of online ride-hailing demand.

3.3. Model Evaluation Indicators and Parameters Selection

3.3.1. Model Evaluation Indicators

In order to quantitatively analyze the prediction effect of the proposed model, mean absolute error (MAE) and mean squared error (MSE) were selected as the evaluation indicators of the model prediction accuracy [36].

(1) MAE represents the average of the absolute value of the deviation of the arithmetic mean of the predicted and observed values, which can effectively avoid the errors from counteracting with each other. The smaller the value of this indicator, the smaller the prediction error and the better the prediction effect. The specific calculation method of MAE is shown in (14).

MAE = \frac{1}{n} \sum_{i = 1}^{n} | (y_{i} - \tilde{y_{i}}) |

(14)

(2) MSE represents the ratio of the square of the deviation of the predicted and observed values to the number of samples, which measures the deviation of the predicted value from the observed value. A smaller value of this indicator also means a higher prediction accuracy. The formula for calculation is as follows:

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \tilde{y_{i}})}^{2}

(15)

In (14) and (15), n is the total number of samples,

y_{i}

is the ride-hailing demand observed value, and

\tilde{y_{i}}

is the predicted value of the ride-hailing demand forecasting.

3.3.2. Model Parameters Selection

The Sklearn framework of the Python platform served as the foundation for training in this paper. To prevent the occurrence of model overfitting, the dropout layer was added to make improvements. By taking the dropout parameter as 0.01, we could improve the generalization ability of the built model by dropping the neuron weight of the network constructed in each layer with a probability of 0.01 at random.

In addition, the RNN might have generally taken a long time to execute in the training process of the model, and the training time was generally restricted by parameters such as the magnitude of the input sample set, the batch processing size, and the size of the set epoch. Therefore, the selection of model hyperparameters played a significant part in the training process of the model, and the quality of the parameter selection significantly affected the model training effect.

Here, we use the process of configuring the dropout parameter as an illustration of the process of configuring parameters, and a similar process was used for the other parameters. With the help of the control variable method, the other parameters are fixed first, and the dropout parameters are set to 0.01, 0.1, 0.5, and 0.9, and the model is trained sequentially. It can be found that when dropout is set to 0.01, the model attains the lowest MAE value in both the training and test sets, which proves that the model fits best. As a result, the dropout parameter should be set to 0.01. The variation of MAE with dropout is shown in Table 3.

A similar approach was used to configure the model separately for other input parameters, such as learning rate, epoch, batch size, and window. The learning rate decreased in the order of 0.1, 0.01, and 0.001; the epoch increased in the order of 100; the batch size increased in the order of 128; and the window increased in the order of 3.

By the above method, each control parameter of the proposed Att-RF-BiLSTM model was determined. The number of training iterations epoch number was set to 500, the learning rate was set to 0.001, the batch size was set to 256, the data time window step was set to 5, the dropout rate was set to 0.01, the unit of LSTM was set to 16, and the input and output sizes were set to 10 and 1, respectively. In order to reduce the training loss of the model and not to produce local optimum results, the Adam optimizer was chosen for this study to perform backward calculations to adjust the network weight parameters.

4. Experimental Analysis

The dataset after the data preparation stage in Section 3.1 was used in this experiment, including real-time observations data in NYC with 4343 sample records each, including several features, such as spd, vsb, temp, dewp, etc. The model was trained for this dataset by randomly splitting it into 90% training sets and 10% test sets. To validate the superiority of the Att-RF-BiLSTM online ride-hailing demand forecasting model proposed in this paper, comparative experiments were conducted with forecasting models Att-XGBoost-BiLSTM, Att-BiLSTM, and pure LSTM, respectively.

4.1. Analysis of Loss Curves of the Proposed Att-RF-BiLSTM Model

Figure 13 shows the change in the loss curve of the proposed Att-RF-BiLSTM model during 500 iterations of model training and model testing. From this figure, it can be seen that the loss function basically converges when the loss function was iterated 100 times, and after 410 iterations, the loss value changed very little and remained constant, and the test loss function was lower than the training loss function value, which indicates that the model achieves better performance in the test set and exhibits stronger generalization ability.

4.2. Analysis of Forecasting Results

In order to validate the better performance of the proposed random forest-based key feature selection method, we constructed an XGBoost-based key feature selection [37] forecasting model, Att-XGBoost-BiLSTM, to compare the forecasting results with the proposed Att-RF-BiLSTM forecasting model. The specific importance score ranking results of different features based on XGBoost are shown in Figure 14. Unlike the random forest-based method that obtains the importance percentages of different features, the XGBoost-based feature importance analysis method obtains the importance scores of different features, and the higher the score means the higher the importance. According to the analysis results in Section 3.2.1, the key features obtained based on the random forest method are five. To ensure the consistency of the comparison experiments, the top five feature scores based on the XGBoost method were also selected as the key features, and they are slp, dewp, temp, spd, and pcp24. It can be easily found that the key features obtained by these two feature selection methods are not the same, but there are three identical key features, slp, dewp, and spd, in both.

Due to the large amount of data, it is difficult to display all the results. Therefore, in this paper, we randomly selected continuous data in the dataset, i.e., 10 and 8 days of continuous data as the training and testing output, respectively, and obtained the fitting effect plots in Figure 15 and Figure 16.

From Figure 15 and Figure 16, it can be seen that the forecasting trends of Uber pickups obtained with the Att-RF-BiLSTM model and the Att-XGBoost-BiLSTM model are consistent with the actual Uber pickups trends not only in the training process but also in the testing process, indicating that the forecasting models based on the key feature selection method are more suitable for online taxi demand forecasting. However, compared to the Att-XGBoost-BiLSTM forecasting model, the Att-RF-BiLSTM forecasting model in this paper gives a better fit to the true values, and its curve is closer to the true values at the fluctuation positions.

In order to verify the forecasting accuracy of the online ride-hailing demand forecasting model, the values of MAE and MSE of Att-RF-BiLSTM and Att-XGBoost-BiLSTM during the test are shown in Table 4. As shown in Table 4, the MAE and MSE values of Att-RF-BiLSTM decrease by 7.52% and 16.67% to those of Att-XGBoost-BiLSTM, which means that the forecasting accuracy of Att-RF-BiLSTM is higher, and the errors between the forecasting values of Uber pickups and the actual values of Uber pickups are smaller. The above results show that the key features screened by random forest for online ride-hailing demand forecasting are more reliable than XGBoost. In conclusion, for multivariate time-series with large data sizes, the Att-RF-BiLSTM model proposed by this study provides better fitting performance and more accurate forecasting results.

To further validate the advantages of the proposed Att-RF-BiLSTM online ride-hailing demand forecasting model, the training and testing results of RF-BiLSTM were compared with those of the Att-BiLSTM and LSTM models, and the final comparison results are shown in Figure 15 and Figure 16.

The results of the comparison between the forecasting and true values of Uber pickups in the training and testing for the three models, Att-RF-BiLSTM, Att-BiLSTM, and LSTM, are shown in Figure 17 and Figure 18. From Figure 17 and Figure 18, it can be seen that the forecasting values of Uber pickups obtained with the Att-RF-BiLSTM model, Att-BiLSTM model, and LSTM model are basically consistent with the trend of the real values of Uber pickups during the training and testing processes. This indicates that the LSTM-based neural network models show good performance in online ride-hailing demand forecasting. As can be seen from Figure 17, the fitting degree of the forecasting values to the true values of the Att-BiLSTM model during the training process is not much different from that of the Att-RF-BiLSTM model, while the fitting degree of the LSTM model is slightly different from both. As shown in Figure 18, the LSTM model fits poorly when forecasting at some critical times with large data variation. For example, the deviation between the forecasting Uber pickups and their true values is greatest at the 55th, 80th, and 128th local lowest data points followed by the Att-BiLSTM and the smallest by the Att-RF-BiLSTM. Therefore, the proposed Att-RF-BiLSTM model shows the best fitting effect compared with the comparison models both in the training and testing processes, especially at some critical times.

Table 5 compares the testing errors of the Att-RF-BiLSTM, Att-BiLSTM, and LSTM models. Comparing the evaluation indicators of Att-RF-BiLSTM with those of the Att-BiLSTM and LSTM models, its MAE value decreases by 11.01% and 18.21%, respectively. Its MSE value is also reduced by 28.57% and 40%, respectively. The Att-RF-BiLSTM model has the lowest forecasting error and highest forecasting accuracy. It further indicates that the proposed Att-RF-BiLSTM model can better describe the nonlinear variation of online ride-hailing demand and has promising application potential and a significant impact on online ride-hailing demand forecasting.

5. Conclusions

In this paper, we propose a novel symmetric Att-RF-BiLSTM network for online ride-hailing demand forecasting. It first uses the random forest algorithm to extract the key features that have a high impact on the online ride-hailing demand and use them as inputs. In addition, then the desired demand forecasting model for online ride-hailing was built by combining the attention mechanism and BiLSTM. This architecture retains the complete past and future information for forecasting through the BiLSTM network with symmetry and dynamically learns the useful key information to influence the results with an attention mechanism. Compared with the traditional way of using the full feature sequence as the prediction model inputs, this study uses the random forest algorithm to extract the key feature sequence as the new model inputs, which can effectively simplify the feature inputs and improve the model training efficiency. In order to validate the superiority of the proposed Att-RF-BiLSTM model, the data collected from Uber pickups in major boroughs of NYC were selected and compared with Att-XGBoost-BiLSTM, Att-BiLSTM, and LSTM models; the proposed Att-RF-BiLSTM model is more reliable, has a higher fitting degree and forecasting accuracy, and has better performance in forecasting online ride-hailing demand accurately and effectively. Specifically, comparing the evaluation indicators of Att-RF-BiLSTM with those of Att-XGBoost-BiLSTM, Att-BiLSTM, and LSTM, its MAE value decreases by 7.52%, 11.01%, and 18.21%, respectively. Its MSE value is also reduced by 16.67%, 28.57%, and 40%, respectively. Therefore, for the forecasting analysis of multivariate time-series data with large data sizes, the combination of attention mechanism based on random forest and BiLSTM neural network is practical and superior and has a good application prospect.

Although this paper considered time features, weather features, and holiday features as input features, short-term online ride-hailing demand is also related to other factors. Future work will consider more complex urban traffic networks and try to explore more features related to online ride-hailing demand to further build more reliable forecasting networks to improve forecasting accuracy. Secondly, due to limited data, this paper only validates the model on the online publicly available dataset of NYC, USA. In future work, we will seek collaboration with online ride-hailing platforms to obtain new datasets to further verify the applicability of the proposed model. Finally, this study focuses on time-related features to forecast online ride-hailing demand. In future work, a more comprehensive dataset with larger data including spatial factors will be selected to construct a prediction model considering both temporal and spatial factors. In addition, the correlations between online ride-hailing and other travel modes will be analyzed, and the model will be optimized and updated to build a joint demand forecasting model.

Author Contributions

Conceptualization, K.S. and X.Z.; methodology, K.S. and X.Z.; validation, K.S.; writing—original draft preparation, K.S.; writing—review and editing, X.Z., S.G., K.S. and X.W.; project administration, X.Z., S.G. and K.S.; funding acquisition, X.Z., S.G. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2021YFB2501200), the National Key R&D Program of China (2019YFB1600100), NSFC (71901038), the 111 Project on Information of Vehicle–Infrastructure Sensing and ITS (B14043), the Joint Laboratory for Internet of Vehicles (213024170015), the Shaanxi Province Science Foundation (2020JQ-392, 2022JQ-663), the China Postdoctoral Science Foundation (2022M710483), and research funds for the Central Universities, Chang’an University (300102240301, 300102242103).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Uber Official Website Data. Available online: https://www.uber.com/en-GB/newsroom/company-info/ (accessed on 1 December 2020).
Clewlow, R.; Mishra, G. The Adoption, Utilization, and Impacts of Ride-Hailing in the United States; Research Report; University of California, Davis, Institute of Transportation Studies: Davis, CA, USA, 2017. [Google Scholar]
Nourbakhshrezaei, A.; Jadidi, M.; Sohn, G. Improving Cyclists’ Safety Using Intelligent Situational Awareness System. Sustainability 2023, 15, 2866. [Google Scholar] [CrossRef]
Bozdogan, H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 1987, 52, 345–370. [Google Scholar] [CrossRef]
Qian, X.; Ukkusuri, S.V.; Yang, C.; Yan, F. A model for short-term taxi demand forecasting accounting for spatio-temporal correlations. In Proceedings of the Transportation Research Board 96th Annual Meeting, Washington, DC, USA, 8–12 January 2017. Research Report No. 17-02470. [Google Scholar]
Wang, D.; Cao, W.; Li, J.; Ye, J. DeepSD: Supply-demand prediction for online car-hailing services using deep neural networks. In Proceedings of the 2017 IEEE 33rd international conference on data engineering (ICDE), San Diego, CA, USA, 19–22 April 2017. [Google Scholar]
Safikhani, A.; Kamga, C.; Mudigonda, S.; Faghih, S.S.; Moghimi, B. Spatio-temporal modeling of yellow taxi demands in New York City using generalized STAR models. Int. J. Forecast. 2020, 36, 1138–1148. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Lu, J.; Zhang, L.; Zhao, Y. Taxi booking mobile app order demand prediction based on short-term traffic forecasting. Transp. Res. Record 2017, 2634, 57–68. [Google Scholar] [CrossRef]
Li, X.; Pan, G.; Wu, Z.; Qi, G.; Li, S.; Zhang, D.; Zhang, W.; Wang, Z. Prediction of urban human mobility using large-scale taxi traces and its applications. Front. Comput. Sci. 2012, 6, 111–121. [Google Scholar]
Moreira-Matias, L.; Gama, J.; Ferreira, M.; Mendes-Moreira, J.; Damas, L. Predicting taxi–passenger demand using streaming data. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1393–1402. [Google Scholar] [CrossRef] [Green Version]
Predict New York City Taxi Demand|NYC Data Science Academy Blog. Available online: https://nycdatascience.com/blog/student-works/predict-new-york-city-taxi-demand/ (accessed on 21 July 2018).
Faghih, S.S.; Safikhani, A.; Moghimi, B.; Kamga, C. Predicting short-term Uber demand using spatio-temporal modeling: A New York City Case Study. arXiv 2017, arXiv:1712.02001. [Google Scholar]
Liu, J.; Cui, E.; Hu, H.; Chen, X.; Chen, X.; Chen, F. Short-term forecasting of emerging on-demand ride services. In Proceedings of the 2017 4th International Conference on Transportation Information and Safety (ICTIS), Banff, AB, Canada, 8–10 August 2017; pp. 489–495. [Google Scholar]
Chang, W.; Li, R.; Fu, Y.; Xiao, Y.; Zhou, S. A multistep forecasting method for online car-hailing demand based on wavelet decomposition and deep Gaussian process regression. J. Supercomput. 2022, 79, 3412–3436. [Google Scholar] [CrossRef]
Xu, J.; Rahmatizadeh, R.; Bölöni, L.; Turgut, D. Real-time prediction of taxi demand using recurrent neural networks. IEEE Trans. Intell. Transp. Syst. 2017, 19, 2572–2581. [Google Scholar] [CrossRef]
Chen, L.; Thakuriah, P.V.; Ampountolas, K. Short-term prediction of demand for ride-hailing services: A deep learning approach. J. Big Data Anal. Transp. 2021, 3, 175–195. [Google Scholar] [CrossRef]
Ara, Z.; Hashemi, M. Ride hailing service demand forecast by integrating convolutional and recurrent neural networks. In Proceedings of the 33rd International Conference on Software Engineering and Knowledge Engineering, Pittsburgh, PA, USA, 1–10 July 2021. [Google Scholar]
Ye, X.; Ye, Q.; Yan, X.; Wang, T.; Chen, J.; Li, S. Demand Forecasting of Online Car-Hailing with Combining LSTM+ Attention Approaches. Electronics 2021, 10, 2480. [Google Scholar] [CrossRef]
Liang, Y.; Huang, G.; Zhao, Z. Joint demand prediction for multimodal systems: A multi-task multi-relational spatiotemporal graph neural network approach. Transp. Res. Part C Emerg. Technol. 2022, 140, 103731. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, H.; Li, C.; Tao, S.; Yang, F. Urban ride-hailing demand prediction with multi-view information fusion deep learning framework. Appl. Intell. 2022, 1–19. [Google Scholar] [CrossRef]
Ara, Z.; Hashemi, M. Predicting Ride Hailing Service Demand Using Autoencoder and Convolutional Neural Network. Int. J. Softw. Eng. Knowl. Eng. 2022, 32, 109–129. [Google Scholar] [CrossRef]
Liu, K.; Chen, Z.; Yamamoto, T.; Tuo, L. Exploring the impact of spatiotemporal granularity on the demand prediction of dynamic ride-hailing. IEEE Trans. Intell. Transp. Syst. 2022, 24, 104–114. [Google Scholar] [CrossRef]
Huang, Z.; Zhang, W.; Wang, D.; Yin, Y. A GAN framework-based dynamic multi-graph convolutional network for origin–destination-based ride-hailing demand prediction. Inf. Sci. 2022, 601, 129–146. [Google Scholar] [CrossRef]
Zhang, D.; Xiao, F.; Kou, G.; Luo, J.; Yang, F. Learning Spatial-Temporal Features of Ride-Hailing Services with Fusion Convolutional Networks. J. Adv. Transp. 2023, 2023, 4427638. [Google Scholar] [CrossRef]
Li, S.; Yang, H.; Cheng, R.; Ge, H. Hybrid deep learning models for short-term demand forecasting of online car-hailing considering multiple factors. Transp. Lett. 2023, 1–16. [Google Scholar] [CrossRef]
Abd El-Karim, M.S.B.A.; Mosa El Nawawy, O.A.; Abdel-Alim, A.M. Identification and assessment of risk factors affecting construction projects. HBRC J. 2017, 13, 202–216. [Google Scholar] [CrossRef] [Green Version]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Pt. C-Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Chen, D.; Zhang, J.; Jiang, S. Forecasting the short-term metro ridership with seasonal and trend decomposition using loess and LSTM neural networks. IEEE Access 2020, 8, 91181–91187. [Google Scholar] [CrossRef]
Zhao, G.; Jiang, P.; Lin, T. Remaining Life Prediction of Rolling Bearing Based on CNN-BiLSTM Model with Attention Mechanism. J. Mech. Electr. Eng. 2021, 38, 1253–1260. [Google Scholar]
Hochreiter, S.; Schmidhuber, J.R. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Mahato, N.K.; Dong, J.; Song, C.; Chen, Z.; Wang, N.; Ma, H.; Gong, G. Electric Power System Transient Stability Assessment Based on Bi-LSTM Attention Mechanism. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 8–11 April 2021. [Google Scholar]
Ward, J.H., Jr. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
Yu, T.; Pei, L.; Li, W.; Sun, Z.Y.; Huyan, J. Prediction of Pavement Surface Condition Index Based on Random Forest Algorithm. J. Highw. Transp. Res. Dev. 2021, 38, 16–23. [Google Scholar] [CrossRef]
Sylvester, E.V.; Bentzen, P.; Bradbury, I.R.; Clément, M.; Pearce, J.; Horne, J.; Beiko, R.G. Applications of random forest feature selection for fine-scale genetic population assignment. Evol. Appl. 2018, 11, 153–165. [Google Scholar] [CrossRef]
Wen, H.; Zhang, D. Highway traffic volume prediction based on Bi-LSTM model. Highw. Eng. 2019, 44, 51–56. [Google Scholar]
Prabha, A.; Yadav, J.; Rani, A.; Singh, V. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier. Comput. Biol. Med. 2021, 136, 104664. [Google Scholar] [CrossRef]

Figure 1. Structure diagram of RNN.

Figure 2. The neural unit structure of RNN.

Figure 3. Basic structure diagram of LSTM cell unit.

Figure 4. BiLSTM network structure.

Figure 5. Structure of the attention mechanism.

Figure 6. Average Uber pickups change from January 2015 to June 2015.

Figure 7. Average Uber pickups change under different categories of precipitation.

Figure 8. Average Uber pickups change under different categories of snow depth.

Figure 9. Distribution of eigenvalue before normalization operation.

Figure 10. Distribution of eigenvalue after normalization operation.

Figure 11. Influencing feature importance ranking by random forest.

Figure 12. The execution flow chart of the proposed Att-RF-BiLSTM online ride-hailing forecasting model.

Figure 13. Loss curve of the proposed Att-RF-BiLSTM model.

Figure 14. Feature importance ranking by XGBoost.

Figure 15. Comparison of Att-RF-BiLSTM and Att-XGBoost-BiLSTM training output.

Figure 16. Comparison of Att-RF-BiLSTM and Att-XGBoost-BiLSTM test output.

Figure 17. Comparison of Att-RF-BiLSTM and other models’ training outputs.

Figure 18. Comparison of Att-RF-BiLSTM and other models’ test outputs.

Table 1. NYC Uber pickups of different boroughs containing weather and holidays dataset (sample).

No	Pickup_dt	Borough	Spd	Vsb	Temp	Dewp	Slp	Hday	Pickups
1	2015-1-1 1:00	Bronx	5	10	30	7	1023.5	1	152
2	2015-1-1 1:00	Brooklyn	5	10	30	7	1023.5	1	1519
3	2015-1-1 1:00	EWR	5	10	30	7	1023.5	1	0
4	2015-1-1 1:00	Manhattan	5	10	30	7	1023.5	1	5258
5	2015-1-1 1:00	Queens	5	10	30	7	1023.5	1	405
6	2015-1-1 1:00	Staten Island	5	10	30	7	1023.5	1	6
7	2015-1-1 1:00	NA	5	10	30	7	1023.5	1	4

Table 2. NYC Uber pickups containing weather and holidays dataset (sample).

Pickup_dt	Spd	Vsb	Temp	Dewp	Slp	Hday	Pickups
2015-1-1 1:00	5	10	30	7	1023.5	1	7344
2015-1-1 2:00	3	10	30	6	1023	1	6043
2015-1-1 3:00	5	10	30	8	1022.3	1	6763
2015-1-1 4:00	5	10	29	9	1022	1	4872
2015-1-1 5:00	5	10	28	9	1021.8	1	2406

Table 3. Comparison of MAE results with different dropouts.

Dropout Value	MAE
Dropout Value	Training Set	Test Set
0.9	0.0238	0.0324
0.5	0.0217	0.0303
0.1	0.0212	0.0295
0.01	0.0196	0.0284

Table 4. Comparison of Att-RF-BiLSTM and Att-XGBoost-BiLSTM models evaluation indicators.

	Att-RF-BiLSTM	Att-XGBoost-BiLSTM
Evaluation Indicators	Att-RF-BiLSTM	Att-XGBoost-BiLSTM
MAE	0.0283	0.0306
MSE	0.0015	0.0018

Table 5. Comparison of Att-RF-BiLSTM model and other models’ evaluation indicators.

	Att-RF-BiLSTM	Att-BiLSTM	LSTM
Evaluation Indicators	Att-RF-BiLSTM	Att-BiLSTM	LSTM
MAE	0.0283	0.0318	0.0346
MSE	0.0015	0.0021	0.0025

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, X.; Sun, K.; Gong, S.; Wu, X. RF-BiLSTM Neural Network Incorporating Attention Mechanism for Online Ride-Hailing Demand Forecasting. Symmetry 2023, 15, 670. https://doi.org/10.3390/sym15030670

AMA Style

Zhao X, Sun K, Gong S, Wu X. RF-BiLSTM Neural Network Incorporating Attention Mechanism for Online Ride-Hailing Demand Forecasting. Symmetry. 2023; 15(3):670. https://doi.org/10.3390/sym15030670

Chicago/Turabian Style

Zhao, Xiangmo, Kang Sun, Siyuan Gong, and Xia Wu. 2023. "RF-BiLSTM Neural Network Incorporating Attention Mechanism for Online Ride-Hailing Demand Forecasting" Symmetry 15, no. 3: 670. https://doi.org/10.3390/sym15030670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RF-BiLSTM Neural Network Incorporating Attention Mechanism for Online Ride-Hailing Demand Forecasting

Abstract

1. Introduction

2. Neural Network Modeling Theories

2.1. LSTM Neural Network

2.2. BiLSTM Neural Network

2.3. Attention Mechanism

3. Att-RF-BiLSTM Neural Network Model Construction

3.1. Time-Series Data Preparation

3.1.1. Data Sources and Preprocessing

3.1.2. Data Preprocessing

3.2. Att-RF-BiLSTM Online Ride-Hailing Demand Forecasting Model

3.2.1. Random Forest-Based Key Feature Selection

3.2.2. Att-RF-BiLSTM Neural Network Online Ride-Hailing Demand Forecasting Model

3.3. Model Evaluation Indicators and Parameters Selection

3.3.1. Model Evaluation Indicators

3.3.2. Model Parameters Selection

4. Experimental Analysis

4.1. Analysis of Loss Curves of the Proposed Att-RF-BiLSTM Model

4.2. Analysis of Forecasting Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI