The gas transmission network is run by transmission system operator, hereinafter referred to as TSOs. An integrated organization would conduct jointly operations including gas trading, running storage facilities and operating the transmission network. Thereby, the network capacities could affect transport requirements. However, TSOs face novel challenges to ensure the security of supply caused by the liberalization of the European gas markets [
1], which makes TSOs no longer allowed to own, trade or store gas. Instead, trading will be conducted by independent companies to ensure discriminatory-free access to the transport network for all traders. Therefore, natural gas forecasting has become a fundamental input to the TSOs’ decision-making mechanisms. Meanwhile, the natural gas market is becoming more and more competitive and is moving towards more short-term planning, e.g., day-ahead contracts, which makes the dispatch of natural gas in the pipeline network even more challenging [
12]. Therefore, a high-accuracy and high-frequency forecasting of local supplies and demands of natural gas consumption is essential for efficient network operation of TSOs.
Although on the contractual level all gas transports of a market area have to be balanced, this needs only to be achieved on average over time. Some outflow might actually only be balanced by an inflow at a later time.
Despite these challenges for the TSOs they need to meet all transport demands. The TSO has the obligation to monitor the situation, foresee possible shortages and react accordingly to ensure the safety of supply. Since changes in gas networks happen rather slowly it is therefore extremely important to have accurate forecasts on the demands and supply of the network to be able to react on time.
Models on natural gas demand forecasting are mainly focused on long term issues. There are quite some publications regarding electricity demand forecasting, (see, e.g. [
3,
30,
32,
47]) but electricity behaves very differently from gas.
A survey on models to predict natural gas consumption published between 1949 and 2010 is presented by [
42] who evidences that only a few works are focused on hourly gas flow prediction. A more recent survey [
51] considers 187 papers published between 2002 and 2017. The authors point out that the majority of works provide daily predictions and recognize that neural networks are the most used models. The authors also show that, on the considered period, most of the works were performed at an aggregated level (i.e. country or city) and only three papers proposed models to forecast the hourly gas consumption.
In [
49], two neural networks were tested to forecast natural gas consumption based on historical data and environmental variables. The authors found a better prediction accuracy when using the multi-layer perceptron compared to the radial basis function. In [
48], a model similar to radial basis neural network was proposed to predict gas consumption in a distribution system. In this work, input variables were selected using a genetic algorithm. Residential hourly gas consumption was predicted with neural networks by [
17]. In this work, the heating degree-hour method which considers the gap between outdoor and indoor temperature was considered. The best hyper-parameters configuration consisted of 29 neurons, a feed-forward backpropagation algorithm and tangent, sigmoid and linear functions for the input, hidden and output layers respectively. Similarly, [
45] proposed neural networks to forecast residential natural gas demand. The proposed network consisted of a multi-layer perceptron with one hidden layer. The input features included calendar (i.e. month, day of the month, day of the week, hour) and weather (temperature) information. The authors found that the average prediction error was higher during the winter months because gas flow was higher. More recently, [
23] compared several machine learning models to predict residential natural gas hourly demand and found that recurrent neural network and linear regression were the most accurate models. The prediction results of monthly gas consumption of residential buildings using Extreme Learning Machine (ELM), artificial neural networks (ANNs) and genetic programming (GP) were presented by [
24]. The ELM is characterized by higher training speed compared to backpropagation and it was found to perform better, in terms of RMSE, compared to the other two techniques. In [
26] the authors set up a two stages methodology to predict daily gas consumption of utility companies. In the first stage, two NNs are run in parallel to produce daily forecasts; in the second stage, a nonlinear transformation of some features of the input vector is performed. The combination of the two stages is based on several methods such as average forecast, recursive least squares, etc. The results show that the mix between the two forecasters has higher accuracy although the combination of the two models increases the complexity. Overall, these works show that the consumer profile is very important when forecasting gas flow. In this regard, [
38] identified seventeen groups profiles, based on their historical consumption and predicted daily gas demand. The overall prediction was obtained from the combination of single predictions.
The backpropagation algorithm optimized with a genetic algorithm was implemented by [
54] to increase the training speed and to achieve a global minimum. The authors predict next day gas loads based on temperature and weather conditions. Furthermore, the authors tested the algorithm on a three years real dataset recorded in Shanghai to predict one month and a half gas load. Similarly, [
55] propose a recurrent neural network to predict daily gas flow. The Output-Input-Hidden Feedback-Elman neural network takes into account, not only the hidden nodes’ feedbacks but also considers the output nodes’ feedbacks. The results improved compared to these obtained with the standard Elman network. However, the authors recognize that further research is needed to forecast gas demand during holidays. In [
4], an adaptive network-based fuzzy inference system (ANFIS) consisting of a neural network integrated with fuzzy logic was proposed to forecast short term natural gas demand. The main advantage of this model was its ability to handle uncertainty, noise and non-linearity in the data and, compared to standard neural network models, provided more accurate results. Wavelet transform has been deployed by [
44] to decompose the hourly gas demand time series and Bi-LSTM and LSTM are optimized using genetic algorithm. The model was applied to winter data on which it has shown good prediction accuracy.
Several static and adaptive models have been tested by [
37] for short-term gas consumption forecast (random-walk, temperature correlation model, linear regression model, ARX, adaptive (recursive) linear auto-regressive model (RARX), neural network (NN), Recurrent NN, Support Vector Regression). They found that the best performance was obtained by the RARX of order 3. Furthermore, they found that nonlinear models such as neural networks and support vector machines had a lower generalization capacity compared to linear models. Finally, they concluded that the adaptive models overall performed better than static models.
The traditional approaches are regression and econometric models. In this regard, the performance of non linear mixed effects, ARIMAX and ARX models to predict gas consumption of 62 residential and small commercial customers was assessed by [
10]. The authors forecast daily consumption of an entire month based on the previous 18 months. The time series included zero flows and missing data which were excluded for the training process. The prediction performance was similar in terms of daily mean absolute error which was close to zero for all the tested models. Thus, the authors propose to combine multiple models although they recognize that this might be a difficult task because of increased computational complexity. Multiple linear regression has been proposed by [
40] who predicted annual gas consumption based on socio-economic variables (GDP and inflation in the case of Turkey) that have been selected based on their statistical significance. Based on the forecast, the authors propose alternative energy policies. Robust least square method combined with log-linear Cobb–Douglas model has been proposed by [
15]. The authors compared the proposed robust and ordinary least square methods for the yearly forecast of the natural gas demand in Brazil, considering the total demand as well as the industrial and power sectors demand. The authors showed that using the proposed model can be very useful when a large amount of past data is not available, which is usually necessary for the calibration of more sophisticated forecast models.
A hybrid model formed by a grey model and an autoregressive integrated moving average model has been proposed by [
52] to predict monthly shale gas production. The authors conclude that the results of the combined model are more accurate than the single linear and nonlinear models.
In [
33], Multivariate Adaptive and Conic Multivariate Adaptive Regression Splines were proposed to predict residential daily gas demand. The two models provided better results in terms of prediction errors (MAE and RMSE) compared to these obtained with linear regression and neural networks. In [
41], the nonlinear characteristics of the natural gas consumption is modeled with several Grey models that are compared to predict the yearly natural gas consumption in China. Nonlinear programming and genetic algorithm have been proposed by [
19] to predict natural gas consumption in the residential and commercial sectors on a yearly basis. Similarly, [
25] proposed the breeder hybrid algorithm which consists of three steps for natural gas flow demand forecast. In the first stage, the coefficients of a nonlinear regression model are estimated. Successively, the estimates are improved using a genetic algorithm. Finally, the optimized coefficients are deployed as initial solutions for the simulated annealing. Nearest neighbor and local regression were proposed by [
6] to predict gas flow in a small gas network with a 15 minutes resolution. The authors evidence the importance of environmental variables such as the temperature. Their method allowed to detect anomalies and the consumption patterns based on one year historical data. In the literature, there are also combinations of several methods to predict one day-head natural gas consumption. In [
34], the time series were decomposed into low-frequency and high-frequency components using Wavelet transform. In a second step, the genetic algorithm and Adaptive Neuro-Fuzzy Inference System were deployed to predict each of the decomposed time series. The output was finally fed into a feed-forward neural network to refine the prediction. The research was focused on different types of natural gas distribution points. The authors obtained better prediction results using the data of distribution points located near the city center. Neural networks have been also compared to the performance of autoregressive models. In [
46], for instance, short term natural gas consumption in Turkey was predicted using SARIMAX model and Neural Networks (Multilayer and Radial Basis) and multivariate regression. They found that SARIMAX had better prediction performance. The temperature correlation model, proposed by [
43], was compared with several configurations of ARX, stepwise regression, Support Vector Regression and neural network. The authors found that SVR and NN performed better on the training set, while high order ARX model performed better on the test set. Support Vector Regression has been deployed with false neighbours filtered approach to predict short term natural gas consumption [
56]. The local predictor was based on the nearest neighbour approach so that the Euclidean distance between the training and test data and the neighbour filter was applied to determine the validity of the predicted values based on the exponential separation rate. The authors obtained better performance prediction compared to ARIMA, neural networks and Support Vector Regression.
Overall, the analyzed literature shows that there are few works that are focused on the comparison between methods to predict hourly gas flow of different types of nodes in a gas network or combining the advantages of different forecasting methods to a hybrid model for hourly gas flows. Therefore, we propose a hybrid model based on optimisation and machine learning and compare its results to four different models to predict hourly gas flow. To address the heterogeneity of the time series for the different node types we compare results obtained for four different types of nodes.