Skip to main content


Swipe to navigate through the articles of this issue

01-12-2021 | Research | Issue 1/2021 Open Access

Journal of Big Data 1/2021

Optimization of air traffic management efficiency based on deep learning enriched by the long short-term memory (LSTM) and extreme learning machine (ELM)

Journal of Big Data > Issue 1/2021
Mahdi Yousefzadeh Aghdam, Seyed Reza Kamel Tabbakh, Seyed Javad Mahdavi Chabok, Maryam Kheyrabadi
Important notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Automatic dependent surveillance-broadcast
Air traffic control
AIR traffic follow management
Air traffic management
Aircraft traffic service
Bidirectional long short-term memory
Deep belief networks
Extreme learning machine
Estimated time of arrival
Estimated time of departure
Long short-term memory
Mean squared error
National air system
Root mean square error
Recurrent neural network
Traffic control area
Terminal manoeuvring area


Air traffic management (ATM) refers to the required activities for the efficient and safe management of the national air system (NAS) in each country. Generally, ATM encompasses the two components of air traffic control (ATC) and air traffic follow management [ 1]. The ATC system mainly utilizes tactical decisions (e.g., real-time separation method) for collision detection. The NAS is divided into several sections to present ATC services and assist air traffic controller operators in the process of traffic control and flight separation by ATCs. The air traffic control methods for the prevention of flight delay and interference is a significant issue in the operating field of ATM [ 1]. Additionally, fleet fuel and flight delays in airports and secondary costs impose significant multiple charges to the airlines.
The arrival schedule of flights is considered by airlines and aviation/airport companies [ 2], attracting attention to the airports with better ATM performance by airlines. Another challenge of flight control is several airports' coverage by a single ATM while each airport may have several pattern areas for its ATM [ 3]. Moreover, the traffic or pattern areas in nearby airports may be dependent or independent, and each airport may have several parallel or non-parallel runways. In this regard, the traffic of parallel runways may be dependent or independent, whereas crossover runways' traffic is dependent. Landing and takeoff runways might differ in every airport, or they might be jointly utilized [ 3, 4]. Furthermore, each runway may have several landing and takeoff procedures, which might have dependent and independent traffic. These issues demonstrate the high complexity of the problem modeling. Considering the huge scale of air traffic data (large data amount) in the classification learning process, the complexity level is higher with the increased number of categories of each class. Additionally, the selection of the significant features by traditional data mining approaches is almost impossible.
Various measures have been taken to solve the problems of ATM and ASP [ 5]. Many studies have aimed to solve these issues using mathematical models and methods, linear programming, mixed planning, and statistical models. However, one of the limitations of these studies is not considering the actual data and operating environment. Therefore, the proposed solutions lack the required accuracy and efficiency in the actual environments of airport aviation operations [ 617]. Some scholars have attempted to address the issue of traffic control and delay only by considering climatic and environmental conditions [ 10, 1820]. Other studies have used the first come first serve (FCFS) technique, along with the queue model and other mixed methods, to solve the problem [ 2125].
Another category of research and articles have applied data mining methods to investigate the influential factors in air traffic and flights [ 19, 2630]. The machine learning algorithm is a conventional method which used to resolve the issues of air traffic, ASL, delay forecast, and minimization.
Machine learning is a branch of artificial intelligence or data analytics that deals with the development of algorithms that can be configured to learn from previous information. Machine learning is indeed a computational method for data mining [ 31, 32].
Overall, this technique has shown better problem-solving efficiency compared to other methods [ 3337]. Some studies have also supported the uncertainty and fuzzy states, incorporating the latter into other techniques [ 3840]. Such solutions have been proposed for ATC problems and are based on the Internet of Things (IoT) [ 16, 41, 42], optimization methods [ 6, 10, 12, 43], multi-objective optimization techniques [ 4446], and intelligent agent-based methods [ 47].
An optimal approach to solving the mentioned issue involves using the structure of artificial neural networks [ 25, 43, 48]. Following the evolution of neural networks, deep learning networks have been considered to be one of the most recent and complete solutions in this regard. This novel technique could solve problems with a high accuracy owing to its ability to accept the large data of the problem and neural network integration, as well as learning techniques and structural dynamism, in the hidden layers formation. Aviation and ATM issues are no exception, and most of the recent studies regarding flight delay forecast and flight control traffic have benefited from this technique [ 1, 47, 4951].
Multilayer neural networks or deep neural networks are included in machine learning subject and a set of algorithms that attempt to model high-level abstract concepts based on learning at various levels and layers, thereby enabling deep learning to process large volumes of data in complicated categories [ 1, 49, 50, 5254]. In the current research, it was attempted to propose an accurate and proper method to solve the problem within the operational domain of the terminal management area (TMA) using a combination of a deep neural network and other methods. Furthermore, the present study is aimed to propose a deep learning model using a long short-term memory (LSTM)-based deep learning model and recurrent neural network (RNN) in order to increase the predictive accuracy of short and long-term annual windows by enhancing deep learning (two-dimensional). In the third phase, the output of the deep model was transferred to the extreme learning machine (ELM) and fast learning deep neural machine in order to calculate the estimated time of arrival (ETA) and estimated time of departure (ETD) of each flight based on other similar input data, including the NAS data, bureau of transportation statistics (BTS) system, and automatic dependent surveillance-broadcast (ADS-B) system. Ultimately, an flights control was developed within the airport TMA range with a 15-min time window for flight arrival using evolutionary and meta-heuristic algorithms by conforming the flight rules to the learning outcomes and increasing the accuracy.
The following sections of the article have been structured, with section two reviews the previous methods in the ATM field, section three describes the proposed model, and section four, evaluates and compares the results with other techniques; in addition, section five has been dedicated to a conclusion.

A review of previous methods

Numerous efforts have been made to solve the ATM problem and minimize the rates of ETA and ETD delay in various dimensions. Most of the studies in this regard have evaluated inbound and outbound flights separately, attempting to propose solutions using methods such as data mining and mathematical techniques (Fig.  1).
In [ 1], researchers used deep learning architectures such as stacked auto encoders, convolutional neural networks, and recursive neural networks as the architecture for forecasting daily delay status. The aim of the present study was to estimate the daily delay at each airport and calculate the delay for a specific flight based on the obtained results. In order to forecast the daily delay, we initially calculated the mean delay of the inbound and outbound flights and added the estimated value to a recursive neural network, along with the weather data as a sequence, which was added to the output after determining the weight and bias of the separate data. Weighting and biasing are repetitive procedures, and each replication stage determines the value of cost function using the stacked memory cell structure (LSTM) and Sigmoid and Tanh functions were replaced by the structure of the RNN. As a result, the information of each hidden layer was stored, thereby increasing the model efficiency. However, the proposed method limitation was the elimination of the details from the rounds related to the management of the ground delays for the flight preparation of the aircrafts. Another issue was the lack of a deeper LSTM structure in a time-based forecast structure, which increased accuracy.
In [ 26], the main objective was to minimize the estimated delay along with the sequence of the flights in resolving the ATM problem; a combination of clustering and neural network was used to solve the problem. The integrated technique had two steps of clustering-based forecast and multi-cell neural network (MCNN)-based forecast. In the first step, principal component analysis (PCA) was used to reduce the variable dimensions of the path vector after the clearing, filtering, and re-analyzing of data. Afterwards, the paths were clustered into several patterns with a clustering algorithm. In the forecast phase, the MCNN model was applied to predict the four-dimensional (4D) aspect of the density path. In addition, there was a predictor for each path partition, which encompassed an NN-based learning cell. Each exclusive learning cell was trained by a set of related paths, and each paths set included the related prediction model. According to the obtained results, the proposed model in present study was stronger, more accurate, and more efficient for short-term predictions. However, some of its limitations were the low-data scale and lack of highly accurate learning methods (e.g., deep machine learning). These challenges were eliminated by using a large data volume and the deep learning technique.
Researchers have used a combination of deep belief networks (DBNs) and PCA to evaluate the aviation safety of the country. In general, DBN is useful for safety prediction since each layer acquires more complicated features from the previous layers. In fact, DBN predicts severe flight incident rates based on PCA results [ 49]. In this case, we assessed seven main factors for unsafe events systematically and in detail, including aircraft, landing and takeoff, aircraft operation, airport and aircraft, ground transportation, and weather. According to the obtained results, the predicted PCA-DBN data were compatible with the actual data on flight incidents. In this regard, the proposed model was considered superior compared to the gray neural network, support vector machine, and DBN. In [ 41], the researchers proposed a deep learning-based model to predict the hard landings of aircrafts based on the quick access record (QAR).
In IoT environment, devices collect data, which are sent to the IoT open cloud platform to be processed and analyzed. The prediction of a hard landing is a common application of IoT in the aviation field. Initially, 15 aircraft landing sensor data were selected from 260 parameters based on meteorology. Afterwards, an LSTM-based deep prediction model was developed to predict hard landing incidents using the selected sensor data. The empirical results showed high performance owing to the increased accuracy of the QAR data. The proposed model in this research was accurate and efficient in the prediction of hard landings, which guarantees the passengers safety and the decreased rate of flight incidents.
In [ 39], the main goal was to detect traffic accidents in order to increase road safety using the deep learning algorithm and stacked auto encoder model. In addition, the back propagation algorithm was used for the accurate adjustment of the parameters in the deep network. Ultimately, a fuzzy controller was exploited to increase the output accuracy of the deep network and adjust the neural network learning parameters based on the mean squared error (MSE). According to the findings, the fuzzy logic systems could be suitable for uncertain or approximate reasoning and allowed decision-making with the estimated values in incomplete or uncertain information.
In [ 28], the researchers presented various methods of data mining in the air transport lounge and assessed their efficacy. The proposed methods were assessed in three types of air transport data, and the flight recording information was provided by a flight recorder for the first time. The unofficial flight information recorder is known as the black box. In an aircraft equipped with a flight recorder, usually up to 500 variables of information are recorded per second for the flight duration, such as time, altitude, vertical acceleration, and vertex. While some of these variables are distinct, the others may be continuous. The artificial data were the second type of the aviation information. This information were focused on flight anomalies. This concept intentionally embedded in the data to examine the ability of the algorithm to detect the anomalies.
These anomalies which might be an unusual sequence of events or an unusual period between events. The second type of data is aviation crash reports, which have no strict rules, and the pilot needs no specific conditions as the reports include narrations. However, a method should be designed to determine the significant data due to their lack of unity.
As for labels and labeled data in aviation data mining, a label is a descriptive word allocated to data based on specific features. In the present study, labels were considered a factor for the formation of a flight incident.
Some of the factors cause to flight incidents included diseases, hazardous environments and autopilot. To improve the accuracy of flight characteristics, the researchers used the time warp edit distance (TWED) and k-means algorithms [ 29]. In first, the researchers assessed a dataset of flights with the desired time in the case of flight routing with the same origin and flight destination to eliminate the effect of the exit point. Then, the adapted k-means algorithm was proposed, in which the distance between various paths was estimated by the TWED algorithm rather than the conventional elastic similarity measurement in the k-means algorithm. In this research, one of the benefits of the proposed method was the increased accuracy of the algorithm and higher efficacy of using the controlled airspace in air traffic management. On the other hand, one of the key limitations of the method was not considering the large scale of the data and use of large data, which led to the higher accuracy of the algorithm.
In [ 55], the researchers proposed a hybrid method of Bayes method and Gaussian mixture model–expectation maximization algorithm (GMM)–EM algorithm to predict and analyze the influential factors in the delay of the flights in Brazil aviation routes. Initially, the degree of the impact of each factor was calculated using outdated data. Then Bayes rules at specific points of the flight route followed by determining whether the delays occurred in larger domains. The next stage involved the estimation of the probability of the delays using the GMM–EM and EM algorithms, which are based on similarity in data. According to the obtained results, the probability of the delays at high levels could be predicted by determining the factors at low levels. Moreover, the GMM–EM algorithm could find more values for the similarity function compared to the EM algorithm, thereby reaching convergence sooner. Moreover, the accuracy of the model was observed to increase, which in turn improved the reliability of the prediction results.
In [ 8], the researchers focused on real-time aircraft routing and planning. In a crowded traffic control area (TCA), problems occur in case of traffic, which is specifically challenging for TCA operation management due to the growing demand for traffic, and the TCAs turn into the bottleneck of the entire ATC system. In this research, the method of linear programming formulations along with flight safety rules has been used to solve the problem, in order to minimize the maximum delay in the entire travel time due to the potential of aircraft congestion. Computational tests have been performed on real data from Rome Airport, the largest airport in Italy in terms of passenger demand.
The solution provides the optimal compromise among various objectives. In [ 9], the researchers proposed a new, efficient computational algorithm to resolve the uncertainty of the air traffic follow management using a limited probability optimization method. They initially developed a chance-constrained model based on the previous integral planning optimization model of the ATFM for the limitation of the possible capacities of the section. Afterwards, a polynomial approximation-based approach was applied to manage the chance optimization problems at large scales. One of the benefits of the proposed method was considering the uncertainty states in ATM, while the main limitation of the technique was the lack of using deep learning methods for large data in order to obtain a more accurate model.
In [ 23], the researchers estimated the input delay time and number of the aircrafts entering a controller space at a single time using a queue model and regression function, while also considering climatic conditions. In addition, the delay was forecasted before reaching the destination by considering variables such as the type of the aircraft, time of arrival, and times of entering and exiting the control space. The overall results of the optimization and artificial intelligence-based operation methods demonstrated that the artificial intelligence methods could overlook some of the errors, which rendered them extremely more accurate compared to the queue models. Meanwhile, the queue model and recursive neural network were observed to have higher learning levels.
In another study, a 3-D convolutional neural network (R-3D CNN) was applied to increase the accuracy of air traffic predictive accuracy [ 56]. Changes in spatial–temporal air movements could be comprehensively considered by using this algorithm. In the mentioned study, the traffic situation graphics (TSG) sequence was applied to extract the prominent features. The proposed TSG enabled the consideration of some real-time factors to enrich the input information. As such, the model input was determined by combining the traffic situations at various light levels with the areas that were specified by other real-time factors, such as important tasks and public air traffic lights. The length of the input sequence was set to 30, 60, and 90 min before the prediction moment in order to determine the effect of the temporal dependencies, so that the optimal architecture proposed could be selected. Furthermore, the evaluation of the prediction results along with the three statistical factors confirmed the ability of the proposed model to yield accurate and sustainable predictive results for the air traffic system by distribution at various optical levels.
In [ 57], the main objective was to predict flight routes using a deep neural network in a capacity management and air traffic operational system. A deep neural network was trained in the historical routes and a set of predictors, and the neural network predicted the most likely route through the airspace. In addition, the network was able to generalize the results to the flights and conditions that had not been detected before. The neural network could also prevent changes by repetitive educations on the newly recorded data. In the mentioned study, an integrated solution was used in the air traffic platforms with the capacity for 10% of the total traffic, and the results of the solution showed the level of the apparent progress.
The promotion of user confidence increases the domain of all traffic.
Large European airports consider strategic flight plans to reduce the air traffic capacity that demand imbalance. In these airports, flights are assigned an entry or departure slot a few months before takeoff. In this regard, the researchers in [ 58] evaluated such strategic plans using the predictions of the flight delays arriving, departing or canceling. The proposed approach was used in London Heathrow Airport during 2013–2018, and the resulting flight plan was assessed in terms of the predicted flight cancelation and delay using a machine learning approach. According to the findings, the proposed method was able to provide the airport coordinators of the possible delays and cancelations related to the strategic plans. In [ 59], an end-to-end deep learning-based approach was also presented to increase accuracy in the air traffic flow using the CNN and RNN algorithms, as well as a convolutional LSTM module, which was proposed to construct a trainable model to predict the air traffic flow. The experimental results of the actual data were indicative of the superior performance compared to the current approaches used for predictive accuracy and stability.
Moreover, the proposed model could predict the flow distribution at various flight levels in the flights controlled space, which in turn improved the ATM level. The analysis of the distribution of the prediction errors on various spaces cells, flight levels, and prediction of the samples indicated that the spatial and temporal transmission patterns of the flight flow in the ATM system could be thoroughly learned by the proposed model. On the other hand, the proposed model could predict that the optimal air traffic management measures were taken to improve performance efficacy system.
In [ 17], the researchers used bidirectional long short-term memory (BLSTM) for the performance data of air transportation management in order to identify the system. In the system, BLSTM was able to reconstruct the nonlinear temporal series and make valid predictions. According to the other findings of the mentioned study, neural networks in deep learning methods could manage complicated nonlinear temporal series and learn to reconstruct these series based on multidimensional inputs, while also storing their knowledge regarding the behavior of the observation datasets.
In [ 60], a multi-step deep sequence learning model (Bi-LSTM + Seq2Seq) was proposed to predict airport delay based on the spatial and temporal relations of the other airports within the network. In the first step, the dataset was processed for the analysis the correlation between the temporal delays of the airports based on the complex network theory. Afterwards, the PageRank and K-means algorithms were applied to cluster the behavior of the networks and identify their overall status. At the next stage, the Bi-LSTM + Seq2Seq model was proposed and trained based on the time-series data on the current status of the network and delay in the interactions between the airports. The experimental results indicated that the suggested model had higher accuracy and sustainability compared to other prediction algorithms.
In [ 31], the main objective was proposing a deep learning-based method to evaluate the delays in inbound flights. Initially, the important features were extracted, followed by model training by artificial neural networks and DBN using random samples. In the mentioned study, researchers applied the momentum learning rate and resilient back propagation, which acted extremely quicker than back propagation, thereby increasing the training pace and model convergence. Notably, the DBNs were based on a Boltzmann machine, where each layer received communications from the previous layer, and a Boltzmann machine was added to the network at each stage. During the training, the inaccurate classification error rate decreased by the fine-tuning of the parameters and momentum learning rate. Since the output of each layer was divergent, the training pace decreased, and the gradient tended to zero.

Proposed method

The proposed model in the present study was based on the LSTM and ELM algorithms. Figure  2 depicts the flowchart of the proposed method. As is observed in Fig.  2, the suggested method had three phases of the loading, normalization, and separation of the data, creating a two-dimensional LSTM back learning structure using a Bi-LSTM neural network, while also estimating the beta weights, training the ELM, and calculating the assessed criteria. The proposed steps have been further explained.

Uploading, normalization, and separation of the data

At this phase, the dataset obtained from [ 53], which contained 100,000 records and five features, was uploaded, and the Min–Max normalization approach, as is shown in Eq.  1, in order to facilitate the comparison of the results.
$$x_{norm} = \frac{{x - x_{min} }}{{x_{max} - x_{min} }}$$
In the equation above, X min and X max are the minimum and maximum of the main feature, respectively, X represents the value of the main feature, and X norm is the normalized feature value.

Creating a two-dimensional LSTM back learning structure and using a Bi-LSTM neural network

At this stage, the initial net weights of the ELM neural network were created using the neural network structure with Bi-LSTM. GRU and LSTM have the same function, which is to find out whether there is a long-term dependency and to overcome the problem of vanishing and exploding gradient. LSTM does it through three gates, namely a forget gate that controls how much information needs to be removed, an input gate that controls how many cell states need to be stored, and an output gate that controls how many cell states are sent to the next cell have to [ 61, 62].
The LSTM network architecture has been initially developed by Hochreiter and Schmidhuber [ 31, 60]. In this structure, the input sequence vector of x = (x 1, x 2,…,x n) was provided, where n represented the sentence’s length. The primary structure of the LSTM was based on the use of three control gates to control a memory cell activation vector. The first forget gate determined that the value of the ct-1 cell at the previous time was maintained until the time of the current cell status of Ct. The second input port determined the amount of the x t storage of the network input to the Ct of the current state of the cell, and the third output gate determined to what extent the Ct was transferred to the current output value of the LSTM networks. The three gates were an entirely connected layer, the layer’s input was a vector, and the output was an actual number. In Fig.  3, the initial structure of the LSTM cell has been demonstrated, which is interpreted as follows:
In the equation above, σ is the sigmoid logistic function, x t shows the t-th word vector of the sentence, and h t is the latent state. In addition, W and b demonstrate the
$$\begin{gathered} {\text{Input gates}}:{\text{ i}}_{{\text{t}}} = \sigma \left( {{\text{W}}_{{{\text{ix}}}} {\text{x}}_{{\text{t}}} + {\text{W}}_{{{\text{ih}}}} {\text{h}}_{{{\text{t}} - {1}}} + {\text{b}}_{{\text{i}}} } \right) \hfill \\ {\text{Forget gates}}:{\text{f}}_{{\text{t}}} = \sigma \left( {{\text{W}}_{{{\text{fx}}}} {\text{x}}_{{\text{t}}} + {\text{ W}}_{{{\text{fh}}}} {\text{h}}_{{{\text{t}} - {1}}} + {\text{b}}_{{\text{f}}} } \right) \hfill \\ {\text{Output gates}}:{\text{o}}_{{\text{t}}} = \sigma \left( {{\text{W}}_{{{\text{ox}}}} {\text{x}}_{{\text{t}}} + {\text{ W}}_{{{\text{oh}}}} {\text{h}}_{{{\text{t}} - {1}}} + {\text{b}}_{{\text{o}}} } \right) \hfill \\ {\text{Cell states}}:{\text{c}}_{{\text{t}}} = {\text{f}}_{{\text{t}}} * {\text{c}}_{{{\text{t}} - {1}}} + {\text{i}}_{{\text{t}}} * {\text{tanh}} \cdot \left( {{\text{W}}_{{{\text{cx}}}} {\text{x}}_{{\text{t}}} + {\text{W}}_{{{\text{ch}}}} {\text{h}}_{{{\text{t}} - {1}}} + {\text{b}}_{{\text{c}}} } \right) \hfill \\ {\text{Cell outputs}}:{\text{h}}_{{\text{t}}} = {\text{o}}_{{\text{t}}} * {\text{tanh}}\left( {{\text{c}}_{{\text{t}}} } \right) \hfill \\ \end{gathered}$$
weight matrices (e.g., W xt is the weight matrix of the forget gate) and bias vectors (e.g., the input gate bias vector), respectively for the three input gates. In order to overcome the shortage of a single LSTM cell, which could only record the previous fields but does not use the future field, two hidden LSTM layers were combined with the same output separately from different directions in the BRNN neural networks. By this structure, the output layers were able to apply the related information from both the previous and future cases.
Moreover, BiLSTM calculated the input sequence of x = (x 1, x 2,…x n) from the opposite direction to the hidden sequence forward of h t  = (h 1, h 2,…, h n) and hidden sequence backward of (h t  = (h 1, h 2,…, h n). The encoded vector of y t also encompassed an aggregation of the final forward and outward outputs.
$$\begin{aligned} {\text{y}}_{{\text{t}}} & = \left[ {{\text{h}}^{ \to }_{{\text{t}}} ,{\text{ h}}^{ \leftarrow }_{{\text{t}}} } \right] \\ {\text{h}}^{ \to }_{{\text{t}}} & = \sigma \left( {{\text{W}}_{{{\text{h}} \to {\text{x}}}} {\text{x}}_{{\text{t}}} + {\text{W}}_{{{\text{h}} \to {\text{h}} \to }} {\text{h}}^{ \to }_{{{\text{t}} - {1}}} + {\text{b}}_{{{\text{h}} \to }} } \right), \\ {\text{h}}^{ \leftarrow }_{{\text{t}}} & = \sigma \left( {{\text{W}}_{{{\text{h}} \leftarrow {\text{x}}}} {\text{x}}_{{\text{t}}} + {\text{W}}_{{{\text{h}} \leftarrow {\text{h}} \leftarrow }} {\text{h}}^{ \leftarrow }_{{{\text{t}} - {1}}} + {\text{b}}_{{{\text{h}} \leftarrow }} } \right), \\ {\text{y}}_{{\text{t}}} & = {\text{W}}_{{{\text{yh}} \to }} {\text{h}}^{ \to }_{{\text{t}}} + {\text{W}}_{{{\text{yh}} \leftarrow }} {\text{h}}^{ \leftarrow }_{{\text{t}}} + {\text{b}}_{{{\text{y}},}} \\ \end{aligned}$$
In the equations above, y = ( y 1, y 2,…, y t,…, y n) is the output sequence of the first hidden layer. Some of the findings in this regard have suggested that classification or regression performance could be further improved by accumulating multiple
$${\text{L}} = {\text{H}}^{{\text{T}}}_{{\text{A}}} {\text{H}}_{{\text{Q}}} \in {\text{R}}^{{{\text{m}}*{\text{n}}}}$$
BiLSTMs in neural networks [ 60]. In addition, theoretical evidence suggests that a deep hierarchical model is more efficient in delivering some functions than the shallow type. In the present study, an accumulated BiLSTM network was defined, where the output y t from the lower layer was converted into the input of the upper layer. The accumulated BiLSTM structure is shown in the following (Fig.  4):
$${\text{h}}_{{\text{t}}} = {\text{W}}_{{{\text{hh}}}} {\text{h}}^{ \to }_{{\text{t}}} + {\text{W}}_{{{\text{hh}}}} {\text{h}}^{ \leftarrow }_{{\text{t}}} + {\text{b}}_{{\text{h}}} ,$$
The definition of A = ( a 1, a 2,…, a m) and Q = ( q 1, q 2, …q m) show the sequence of the problem and sequence of the responses, respectively, where n and m demonstrate the length of the problem and responses, a t and q t are the t-th words of the problem and responses. In this section, the stacked BiLSTM was implemented on the problem and Fig.  4. Structure of stacked BiLSTM networks responses, and the hidden-mode H Q and H A matrices were obtained.
$$\begin{gathered} {\text{h}}^{{\text{q}}}_{{\text{t}}} = {\text{sBiLSTM}}\left( {{\text{h}}^{{\text{q}}}_{{{\text{t}} - {1}}} ,{\text{h}}^{{\text{q}}}_{{{\text{t}} + {1}}} ,{\text{q}}_{{\text{t}}} } \right),{\text{h}}^{{\text{q}}}_{0} = 0, \hfill \\ {\text{h}}^{{\text{a}}}_{{\text{t}}} = {\text{sBiLSTM}}\left( {{\text{h}}^{{\text{a}}}_{{{\text{t}} - {1}}} ,{\text{h}}^{{\text{a}}}_{{{\text{t}} + {1}}} ,{\text{ a}}_{{\text{t}}} } \right),{\text{ h}}^{{\text{a}}}_{0} = {\text{h}}^{{\text{q}}}_{{\text{n}}} , \hfill \\ \end{gathered}$$
The calculations are as follows:
$$\begin{gathered} {\text{H}}_{{\text{Q}}} = \left[ {{\text{h}}^{{\text{q}}}_{{1}} ,{\text{h}}^{{\text{q}}}_{{2}} , \ldots ,{\text{h}}^{{\text{q}}}_{{\text{n}}} } \right] \in {\text{R}}^{{{\text{d}}*{\text{n}}}} , \hfill \\ {\text{H}}_{{\text{a}}} = \left[ {{\text{h}}^{{\text{a}}}_{{1}} ,{\text{h}}^{{\text{a}}}_{{2}} , \ldots ,{\text{h}}^{{\text{a}}}_{{\text{m}}} } \right] \in {\text{R}}^{{{\text{d}}*{\text{m}}}} , \hfill \\ \end{gathered}$$
where d represents the dimensions of the hidden mode (Fig. 5).

Coherence mechanism for problem presentation

In this section, a coherence mechanism was implemented to encode the problem in accordance with the response sequence (Fig.  6). We attempted to interact more closely with the functions and summaries in the coherence mechanism by designing the matrices’ multiplication to address more questions. Initially, the matrix multiplication was carried out to estimate the L matrix, which included the propensity scores related to all pairs of the problem and response.
The Soft max function was used to standardize the vector elements and was likely to be effective against multiple classifications and distribution problems. Therefore, column-row-based Soft max functions were used to generate accuracy weights for the hidden modes of the problem and response separately by the following equation:
$$\begin{gathered} {\text{A}}^{{\text{Q}}} = {\text{softmax}}\left( {\text{L}} \right) \in {\text{R}}^{{{\text{m}} * {\text{n}}}} , \hfill \\ {\text{A}}^{{\text{T}}} = {\text{softmax}}\left( {{\text{L}}^{{\text{T}}} } \right) \in {\text{R}}^{{{\text{m}} * {\text{n}}}} , \hfill \\ \end{gathered}$$
In order to obtain the accuracy vector of the question with respect to each word of the response, we combined the explanatory weights and approximation matrix to calculate the new CQ and CA field vectors. In this section, C Q and C A were the results of the interaction between the problem and vector response, as follows:
$$\begin{gathered} {\text{C}}^{{\text{Q}}} = {\text{H}}_{{\text{A}}} {\text{A}}^{{\text{Q}}} \in {\text{R}}^{{{\text{d}} * {\text{n}}}} \hfill \\ {\text{C}}^{{\text{A}}} = {\text{H}}_{{\text{Q}}} {\text{A}}^{{\text{A}}} \in {\text{R}}^{{{\text{d}} * {\text{m}}}} \hfill \\ \end{gathered}$$

Attention mechanism (accuracy) to display the problem

A soft accuracy layer could be used for the integration of information from the words of the problem and response in order to reduce the information loss of the stacked BiLSTM [ 60, 63]. In the proposed model, the attention mechanism was applied for the cohesion output. In the current research, CQ t was assumed to show the t-th attention field vector of this problem, and the maximum aggregation occurred to convert the input into a vector with O q fixed length. In addition, the software weight of all the text vectors (CA, CA2,…,Cam) could be learned independently based on O q through the attention mechanism, and the O a weight field vector used the response as the final representation.
$$\begin{aligned} & {\text{O}}_{{\text{q}}} = {\text{max}}_{{0 < {\text{t}} < {\text{n}}}} {\text{C}}^{{\text{Q}}}_{{\text{t}}} , \\ & {\text{M}}_{{{\text{aq}}}} \left( {\text{t}} \right) = {\text{tanh}}\left( {{\text{W}}_{{{\text{am}}}} {\text{C}}^{{\text{A}}}_{{\text{t}}} + {\text{W}}_{{{\text{qm}}}} {\text{O}}_{{\text{q}}} } \right), \\ & {\text{S}}_{{{\text{aq}}}} \left( {\text{t}} \right) \propto {\text{exp}}\left( {{\text{w}}^{{\text{T}}}_{{{\text{ms}}}} {\text{M}}_{{{\text{aq}}}} \left( {\text{t}} \right)} \right), \\ & {\text{Oa}} = \mathop \sum \limits_{{{\text{t}} = 1}}^{{\text{m}}} {\text{CAt Saq}}\left( {\text{t}} \right) \\ \end{aligned}$$
In the equations above, W am and W qm show the attention matrices of C A t and Oq, respectively, and W ms is the attention weight vector. The official presentation of the Q a response was determined based on the attention (accuracy) weight of S aq (t) for the t-th word response text vector. In addition, normalization occurred by the performance of the Soft max function, which was proportional to C A t. The higher values of S aq (t) demonstrated a more significant correlation between C A t and the problem, while the problem vector drew more attention (Fig.  7).

Calculation of beta weights and ELM training

Compared to the BP networks, the ELM network lacks the output layer bias. While the input weight and bias of the hidden layer of the ELM network are generated randomly, the weights obtained from the neural network encompassing BiLSTM was applied at this stage of the present study, and only the output weight had to be determined, which could limit the manual adjustment of the parameters of each layer in the BP neural network and improve the predictive accuracy. Figure  8 depicts the structure of the ELM.
As can be seen, x 1, x 2,…, x n were the input of the educational data, and w ij and β jk were the input weight in the neural network and indicative of the output weight vector between the hidden layer and output node, respectively. As a result, the output of the hidden layer corresponded to the x input. In this regard, OL was the node of the hidden layer, and b j was the neuron threshold in the hidden layer. In addition, the education sample set was {( xi, yi)| xi2 Rn, yi2 Rm, i = 1,2,..., N}, and the L hidden layer was the number of the neural cells. The excitation function was shown by g(x) in the ELM. In the current research, sigmoid was selected as the g(x) function, and the ELM model could be defined, as follows:
$$\mathop \sum \limits_{i = 1}^{{\tilde{N}}} \beta_{i} g_{i} \left( {x_{j} } \right) = \mathop \sum \limits_{i = 1}^{{\tilde{N}}} \beta_{i} g_{i} ({\text{wixj}} + {\text{bi}}) = {\text{oj}},{\text{j}} \in \left[ {{1},{\text{N}}} \right]$$
The matrix was equal to:
$${\text{H}}\upbeta = {\text{Y}}$$
In the equation above
$${\varvec{\beta}} = [{\varvec{\beta}}_{1} \user2{,\beta }_{2} , \ldots ,{\varvec{\beta}}_{{\varvec{L}}} ]_{{{\varvec{l}} \times {\varvec{m}}}}^{{\varvec{T}}} {\mathbf{Y}} = [{\varvec{y}}_{1} ,{\varvec{y}}_{2} , \ldots ,{\varvec{y}}_{{\varvec{L}}} ]_{{{\varvec{N}} \times {\varvec{M}}}}^{{\varvec{T}}} = \left[ {\begin{array}{*{20}c} {{\varvec{g}}\left( {{\varvec{w}}_{1} {\varvec{x}}_{1} + {\varvec{b}}_{1} } \right)} & \cdots & {{\varvec{g}}\left( {{\varvec{w}}_{1} {\varvec{x}}_{1} + {\varvec{b}}_{{\varvec{L}}} } \right)} \\ \vdots & \ddots & \vdots \\ {{\varvec{g}}\left( {{\varvec{w}}_{1} {\varvec{x}}_{{\varvec{N}}} + {\varvec{b}}_{1} } \right)} & \cdots & {{\varvec{g}}\left( {{\varvec{w}}_{{\varvec{L}}} {\varvec{x}}_{{\varvec{N}}} + {\varvec{b}}_{{\varvec{L}}} } \right)} \\ \end{array} } \right]_{{{\varvec{N}} \times {\varvec{L}}}}$$
Equation  14 was equivalent to minimizing squares, as follows:
$$\hat{\beta } = argmin_{\beta } ||H\beta - \left| Y \right||_{F}$$
Equation  15 was solved as:
$$\beta = H^{ + } {\text{T}}$$
The inverse H+ was generalized from the hidden output layer matrix. In the final step, we examined the assessable criteria.

Analysis and evaluation

At this stage of the research, we are analyzed and evaluated the applied data and assessed the results and criteria.


The dataset obtained from [ 1, 54] included 1,00,000 records and 15 features according to Table 1.
Table 1
Applied dataset
The airport from which the flight takes off
The airport where the flight lands
Flight Operations Airlines
Moon Flight date
flight date
Flight day (number per week)
Flight number registered in the flight plan
Time of departure of the aircraft from the runway
Scheduled time for flight departure in flight schedule
Predict the time for the plane to leak or take off
Flight time in the flight cruise section
Flight distance between the airport of origin and the airport of destination
Flight landing time at the destination airport
Flight delay rate in flight landing
Flight delay in departure
The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT’s monthly Air Travel Consumer Report and in these datasets of (2015 or 2005 till 2015 or 2010 till 2015 or 2005 till 2008) flight delays and cancellations. The flight delay and cancellation data was collected and published by the DOT's Bureau of Transportation Statistics.
Each entry of the flights.csv file corresponds to a flight and we see multi version of this dataset in variable of times e.g. the dataset that recorded in 2015 have more than 5,800,000 flights. These flights are described according to 31 variables. A description of these variables as follow:
YEAR, MONTH, DAY, DAY_OF_WEEK: dates of the flight.
AIRLINE: An identification number assigned by US DOT to identify a unique airline.
ORIGIN_AIRPORT and DESTINATION_AIRPORT: code attributed by IATA to identify the airports.
SCHEDULED_DEPARTURE and SCHEDULED_ARRIVAL: scheduled times of take-off and landing.
DEPARTURE_TIME and ARRIVAL_TIME: real times at which take-off and landing took place.
DEPARTURE_DELAY and ARRIVAL_DELAY: difference (in minutes) between planned and real times.
DISTANCE: distance (in miles) An additional file of this dataset, the airports.csv file, gives a more exhaustive description of the airports.

Assessable criteria

It was crucial to test and evaluate the results by a set of criteria to assess the performance of the proposed method. In general, the confusion matrix was used to evaluate the position and efficiency of the disease classification and diagnosis systems. The analysis of the confusion matrix in the classification and detection of flight delays led to the four modes of true positive, true negative, false positive, and false negative. Table 2 shows the position of the parameters in the confusion matrix.
Table 2
Confusion matrix
Real values
Predicted values
With delay
Without delay
With delay
TP (true positive)
FN (false negative)
Without delay
FP (false positive)
TN (true negative)
The elements of the matrix were equal to:
In addition, the following criteria were used to evaluate the performance of the proposed method (Tables 3, 4).
Table 3
Confusion matrix description
The number of the behaviors that represented the existence of a delay and were correctly predicted by the model
The number of the behaviors that represented the presence of a delay, and the model incorrectly predicted the absence of delay
The number of the behaviors that indicated the absence of delay, and the model incorrectly predicted the existence of delay
The number of the behaviors that showed the lack of delay, and the model correctly predicted them
Table 4
Formulations description
\(Accuracy = \frac{{{ }TP + TN}}{{TP + {\text{T}}N + FP + FN}}\)
This was the most important criterion for determining the performance of a classification algorithm, which showed the percentage of the proper classification of the total set of the experimental record
\({\text{Rcall}} = \frac{TP}{{TP + FN}}\)
It showed the ability of the algorithm to accurately detect delay
\(Specificity = \frac{TN}{{FP + TN{ }}}\)
It demonstrated the efficiency of the classifier in the accurate prediction of the lack of delay
\({\text{Precision}} = \frac{TP}{{TP + FP{ }}}\)
It demonstrated the ability of the algorithm to detect the positive categories (i.e., delay)
\({\text{F - measure}} = \frac{{2*Recall*Precision{ }}}{{Precision + Recall{ }}}\)
It showed the harmonic mean between accuracy and recall
\(R{\text{MSE}}\sqrt {\mathop \sum \limits_{{{\text{t}} = 1}}^{{\text{n}}} \left( {{\text{y}} - {\text{y}}} \right)^{2} /{\text{n}}}\)
Measuring the accuracy of the predicted rates compared to the correct rates
\({\text{MSE}} = \mathop \sum \limits_{{{\text{t}} = 1}}^{{\text{n}}} \left( {{\text{y}} - {\text{y}}} \right)^{2} /{\text{n}}\)
It was a statistical tool to determine the predictive accuracy in modeling
If the distribution of two datasets in a dataset was not the same, this criterion was used to calculate the accuracy of the introduced method

Results and discussion

Flight delays and the problem of predicting the amount of delay were divided into several factors, conditions, and data. According to a reliable study in this regard, flight delay predictions could be classified as:
Delays due to flight planning and scheduling;
Delays due to flight operation conditions at the airport;
Delays due to weather conditions;
Delays due to the terms and conditions of airline aviation operations and air traffic control;
Delays due to temporary conditions, such as the flight season or day;
Delays due to the flight conditions of the national flight network;
Delays due to the flight atmosphere
Since the type of delay in the present study included the numbers one (delays due to flight planning and scheduling), two (delays due to flight operation conditions at the airport), and six (delays due to the flight conditions of the national flight network), the amount of delay time slot was considered to be less than 15 min and 15–30 min based on the mentioned findings. A squawk radar is considered for the flight when the aircraft announces its readiness to fly based on the flight time specified in the flight schedule, and the flight will continue with the same squawk and flight sequence if it continues for 15 min. Otherwise, the squawk is canceled, and the flight must request a flight squawk from the country’s air control center, which will change the flight schedule. On the other hand, if there is a delay of more than 15 min and less than 30 min, the flight can carry on with the same schedule and a new squawk. In case of a delay of more than 30 min, the flight needs to send a flight delay message to the national air traffic network or set and send a new flight schedule.

Calculate MSE and RMSE

MSE is a statistical tool applied to determine predictive accuracy of a model. Table 5 shows the root-mean-square error (RMSE) of the desired airports. The parameter is mostly used to estimate the difference between the predicted values by a model and the observed values [ 1, 53]. The accuracy of the proposed model would be higher when the MSE per each specific mother was lower than the other model. The criteria considered in the proposed method for two delays of 15 and 30 min and 10 airports are presented in Table 5.
Table 5
MSE and RMSE criteria
MSE (Delay 15)
MES (Delay 30)
RMSE (Delay 15)
RMES (Delay 30)
In other words, the higher predictive accuracy of a model leads to the lower MSE. The RMSE criteria in 30-min delays of LORD, PHX, and JFK airports had a lower percentage compared to the other airports, which was mainly due to the need for fewer traffic data compared to other airports, especially at the PHX Airport. In the case of the PHX Airport, the amount of air traffic data did not exceed the threshold value, while the traffic data for the other airports exceeded the threshold value [ 54].


The main purpose of this study is to increase accuracy. To better evaluate the proposed method, different situations have been considered based on these three scenarios:
  • First scenario: 80% of data for learning and the remaining 20% for testing
  • Second scenario: 60% of data for learning and the remaining 40% for testing
  • Third scenario: 70% of data for learning and the remaining 30% for testing
Also by viewing all first scenarios you can say that the Third scenario had a higher accuracy (Table 6).
Table 6
Accuracy criteria for 15- and 30-min delays
Scenario 1
Delay 15
Delay 30
Scenario 2
Delay 15
Delay 30
Scenario 3
Delay 15
Delay 30
According to the findings, the LSTM-ELM hybrid method could detect the delay with the accuracy of 96.27. Accuracy varied at different time intervals of 15 and 30 min at various airports. According to the obtained results, the accuracy of the 30-min delays was higher at the ATL Airport. Nevertheless, the accuracy was acceptable in the other airports as well. The other criteria for delay in the other airports are compared in Table 7.
Table 7
Comparison of evaluated criteria for 15- and a 30-min delays
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
Delay 15
Delay 30
The first evaluated criterion was accuracy. As is observed, a 30-min delay had a higher accuracy percentage, a reason for which is that the delay has been obtained and calculated due to flight operations in the TMA control space in estimating 30-min delays, which adds to the previous delays and could no longer be estimated.
Moreover, in delays of 30 min and more, the recorded information is more accurate since the order of flight arrival and departure numbers changes according to the order intended for the flight with the airport control mechanism and it is necessary to send a flight delay message or a flight plan update.
The second criteria evaluated in Table 7 was recall, which had a better percentage of 15-min delays at the LAX Airport. Some of the advantages of the data of these two airports included less noise and proximity to each other. This airport has the largest number of flights compared to the nearby airports, as well as a higher operating volume than other airports. The amount of system recall in the obtained estimate leads to the detection and reduction of human errors, operating systems, aviation accidents, and operational and airport costs. In addition, the three criteria of accuracy balance, MCC, and F-measure had better performance in 30-min delays. The use of the BiLSTM algorithm and improvement of the ELM parameter had a properly generalized 30-min delay. In addition, the improvement of the ELM in the training and testing phase will increase accuracy and precision compared to other airports. Therefore, it could be concluded that the effect of the delay was properly modeled using the proposed method. In general, the improved ELM algorithm is faster, more accurate, and more generalizable in classification compared to other algorithms.
According to Table 7, the cause of 40% of the delays has been recorded in the airports, the most important of which was air time, followed by delayed arrival. Each of the delay factors alone could record several arrival delays at the subsequent airports, except for the arrival delay factor. Therefore, a significant part of the delay factors was related to delayed arrivals, which will be resolved when airlines have the required time for retrieving and returning to the flight schedule. At present, the cause of delays of less than 15 min and departure delays is not recorded at most airports. However, the recording information for the delays between 15 and 30 min is more thorough, which leads to higher accuracy and precision (Table 8).
Table 8
Comparison of evaluated criteria
Accuracy [ 1]
Accuracy [ 54]
Accuracy proposed

Comparison of the proposed method with conducted research

Improving the accuracy and precision of the ATM is a basic method in ATM research. Several ATM approaches have been provided on an ATC level. The accuracy of the proposed approach to traffic was low and did not respond to heavy traffic.
In the present study, an LST-ELM hybrid model was applied to improve the accuracy of the proposed method. The comparison of the proposed approach for the 30-min delay and [ 1] and [ 54] studies is shown in Table 8.
According to the obtained results, the proposed method had a more appropriate performance improvement as opposed to the comparable references due to the reconstruction of nonlinear time series and valid predictions. The obtained results also indicated that the proposed method could manage a complex nonlinear time series. Therefore, the use of the BiLSTM algorithm requires fewer hidden layers due to its greater learning capability and improving of the ELM network, which could enhance accuracy in an air traffic delay. Unlike other algorithms (e.g., BP), using the ELM algorithm needs no hidden layers, and its parameters are selected randomly. The goal of this algorithm is achieving the lowest training error and the smallest output soft weight. Furthermore, the improvement of this algorithm leads to the avoidance of the local minimum, and BiLSTM could be used to solve the long-term dependency problem. Together, these two algorithms improve accuracy more effectively compared to other methods.


The improved accuracy in ATM management problems is proposed in this paper. ATM includes all the necessary activates for the safe and useful management of the National Aviation System which is one the most challenging problems in our country's airports right now. In this paper, the dual-sided LSTM algorithm is used to improve the 15 and 30-min delays’ accuracy. Also, this algorithm is used to improve the ELM algorithm's parameters. The data set used in this paper is taken from Kaggle and is a simulation used by MATLAB. The results show a higher accuracy improve the rate in comparison to other paper and also show that the RMSE parameter in 30-min delays has a lower percentage in ORD, PHX, and JFK airports in comparison to the other airports.
In further studies, to increase the ATM accuracy, other LSTM models like Casc-LSTM and Ens2-LSTM can be used alongside the ELM algorithm. One-way and two-way Lstm council can also be used along with other algorithms.


The authors are thankful to anonymous reviewers for their valuable comments and suggestions that helped improving the quality of the paper.


Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors have no conflict of interest.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.


Automatic dependent surveillance-broadcast (ADS-B): is a surveillance technology in which an aircraft determines its position via satellite navigation and periodically broadcasts it, enabling it to be tracked.
Air traffic management (ATM): is an aviation term encompassing all systems that assist aircraft to depart from an aerodrome, transit airspace, and land at a destination aerodrome, including Air Traffic Services (ATS), Airspace Management (ASM), and Air Traffic Flow and Capacity Management (ATFCM).
Air Traffic Service (ATS): is a service which regulates and assists aircraft in real-time to ensure their safe operations. In particular, ATS is to:
  • prevent collisions between aircraft; provide advice of the safe and efficient conduct of flights;
  • conduct and maintain an orderly flow of air traffic;
  • notify concerned organizations of and assist in search and rescue operations.
Bidirectional LSTMs: are an extension of traditional LSTMs that can improve model performance on sequence classification problems. In problems where all time steps of the input sequence are available, bidirectional LSTMs train two instead of one LSTMs on the input sequence.
Elapsed flying time: Actual time an airplane spends in the air, as opposed to time spent taxiing to and from the gate and during stopovers.
Extreme learning machines (ELM): are feed-forward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need not be tuned.
Long short-term memory (LSTM): is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. Unlike standard feed-forward neural networks, LSTM has feedback connections.
Terminal control area (TCA or TMA): A terminal control area (TMA, or TCA in the U.S. and Canada), also known as a terminal manoeuvring area (TMA) in Europe, is an aviation term to describe a designated area of controlled airspace surrounding a major airport where there is a high volume of traffic.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article

Other articles of this Issue 1/2021

Journal of Big Data 1/2021 Go to the issue

Premium Partner

    Image Credits