Introduction
-
The energy load data are highly complex and carries non-linear fluctuations due to several factors, such as measurement errors, unpredictable patterns, anomalies, etc. These sudden/abrupt variations in the demand data can pose challenges to developing an inefficient prediction model. To overcome this limitation, the current approach employs the Gaussian smoothing technique to improve data quality before feeding it into the learning model. The Gaussian smoothing aims to remove irregularities and inconsistencies in the data, thus contributing to the generalization reliability and accuracy of the prediction model.
-
The existing data decomposition techniques, such as wavelet, Fourier transform, and mode decomposition techniques, exhibit several limitations, such as noise sensitivity, dependence on shifting algorithm, and deciding the optimal number of modes. To address these limitations, the current research integrates CEEMDAN with neural models to achieve improved results.
-
Efficiently capturing historical relationships within the load time-series observations is crucial for accurate predictions. In this context, the current research further integrates an attention mechanism with data decomposition, smoothing, and neural models to extract the relevant information while reducing the impact of irrelevant noise or errors. The attention mechanism enables the model to emphasize on the extracted relevant information by giving more weightage during the model-building phase.
-
Lastly, the current research provides a novel dataset describing the load estimation patterns of the southern states of the country India. The proposed approach’s performance evaluation is conducted on this novel dataset using widely adopted evaluation measures. The evaluation results describe the efficacy of the proposed approach in estimating the patterns in the specific context of the southern states of India.
Literature
Traditional, machine learning, and deep neural-based forecasting models
Hybrid techniques for load forecasting
Working methodology description
Data preprocessing (Step-II)
-
Data cleaning: This stage aims to rectify inconsistencies in the dataset, like missing records and inconsistent values. In the past, several data cleaning techniques, such as imputations by central tendency/regression techniques/clustering, are employed to improve data quality [53]. However, in the context of the problem under consideration, the missing values imputation approach that considers the historical dependencies will be suitable. Therefore, the current research employs a missing value technique that leverages the past seven years’ load time-series observations.
-
Data reduction and transformation [53]: These techniques (such as data discretization, dimensionality reduction, integration, and standardization) aim to prepare/convert the data into a format that is conducive for building learning models. The current approach employs the min–max normalization technique to transform data to a common range. The mathematical representation of the min–max normalization is given as follows [53]:Here, \(O_{ts}\) and \(N_{ts}\) represent the old time series and the new scaled time series, respectively. The maximum and minimum values corresponding to the old time series are represented by \(max(O_{ts})\) and \(min(O_{ts})\).$$\begin{aligned} N_{ts} = \frac{O_{ts} - min(O_{ts})}{max(O_{ts})-min(O_{ts}}. \end{aligned}$$(1)
Gaussian smoothing (Step-III)
Data decomposition (Step-IV)
Model building (Step-V and Step-VI)
Attention-based bi-directional GRU network model
Proposed data decomposition-based prediction strategy
-
Sub-step 1: Data decomposition—The CEEMDAN technique is implemented to decompose the load time-series dataset corresponding to each state into corresponding IMFs and residual components.
-
Sub-step 2: The GRU model needs input data structured in the three dimensions (S, W, and F). Here, S represents the number of input timestamps, W means the sequences length, and F represents the features within each sequence. The lag parameter is applied to generate a feature input matrix for input to the bi-directional GRU models. The attention network models is designed and included to capture the dependencies within each identified mode component.
-
Sub-step 3: Performance validation is critical for defining the reliability of the prediction models. As a result, the feature matrix obtained from sub-step 2 is divided into training, validation, and testing datasets. The training and validation sets are utilized to develop a model and obtain an unbiased estimate of its prediction accuracy.
-
Sub-step 4: The attention-based bi-directional network models are trained and developed for each IMF and residual component (shown in Fig. 1). As desired, the relevant temporal intrinsic dependencies identified by the attention mechanism are effectively captured by the bi-directional GRU model during this stage.
-
Sub-step After successful training, the developed sequential models are employed to generate a forecast for the testing dataset. The target prediction outcomes are determined by aggregating the forecasting results of all IMFs (corresponding to the respective state).
Experimental results and discussion
Dataset description
State | Latitude | Longitude | Min | Max | Mean | Std. |
---|---|---|---|---|---|---|
Andhra Pradesh | 15.9129\(^{\circ }\) N | 79.7400\(^{\circ }\) E | 93.50 | 284.8 | 162.23 | 32.33 |
Karnataka | 15.3173\(^{\circ }\) N | 75.7139\(^{\circ }\) E | 112.20 | 273.3 | 184.45 | 27.46 |
Kerala | 10.1632\(^{\circ }\) N | 76.6413\(^{\circ }\) E | 38.90 | 89.4 | 65.66 | 6.97 |
Tamil Nadu | 11.1271\(^{\circ }\) N | 78.6569\(^{\circ }\) E | 144.0 | 365.4 | 284.52 | 32.07 |
Puducherry | 11.9416\(^{\circ }\) N | 79.8083\(^{\circ }\) E | 2.40 | 9.70 | 6.847 | 0.91 |
Data preprocessing and smoothing
Hyper-parameter name | Optimal range (for all States) |
---|---|
Input size | 24 |
Prediction horizon | 1 |
Number of Bi-LSTM layers | 4 – 6 |
Number of neurons per layer | 32 – 256 |
Learning rate | 0.001 – 0.0001 |
Activation function | ‘ReLU’ |
Optimizer | ‘ADAM’ |
Andhra Pradesh | Karnataka | ||||||
---|---|---|---|---|---|---|---|
Model/Measure | RMSE | MAPE | MAE | Model/Measure | RMSE | MAPE | MAE |
LSTM | 6.320 | 0.0285 | 4.988 | LSTM | 9.236 | 0.0385 | 7.392 |
GRU | 5.694 | 0.0245 | 4.250 | GRU | 9.119 | 0.0363 | 7.257 |
Bi-GRU | 5.787 | 0.0250 | 4.369 | Bi-GRU | 9.047 | 0.0340 | 7.194 |
Proposed | 2.854 | 0.0103 | 2.210 | Proposed | 4.230 | 0.0200 | 3.018 |
Kerala | Tamil Nadu | ||||||
---|---|---|---|---|---|---|---|
Model/Measure | RMSE | MAPE | MAE | Model/Measure | RMSE | MAPE | MAE |
LSTM | 2.843 | 0.0307 | 2.107 | LSTM | 12.575 | 0.0310 | 8.821 |
GRU | 2.763 | 0.0293 | 1.996 | GRU | 12.151 | 0.0306 | 8.767 |
Bi-GRU | 2.707 | 0.0288 | 1.832 | Bi-GRU | 12.063 | 0.0303 | 8.293 |
Proposed | 2.486 | 0.0257 | 1.760 | Proposed | 6.875 | 0.0182 | 5.251 |
Time-series decomposition
Comparative evaluation of the proposed approach (Step-VII)
-
RMSE: is a statistical measure to quantify the extent of errors between the model-predicted values from a model and the actual observations.
-
MAPE: is a metric to quantify the error as a percentage deviation of model-predicted observations from the actual values.
-
MAE: is a measure to signify the absolute difference between the observed and the model-predicted values.
Pondy | |||
---|---|---|---|
Model/Measure | RMSE | MAPE | MAE |
LSTM | 0.453 | 0.0486 | 0.347 |
GRU | 0.450 | 0.0474 | 0.332 |
Bi-GRU | 0.427 | 0.0472 | 0.331 |
Proposed Approach | 0.247 | 0.0255 | 0.182 |
-
Comparative evaluation results of existing benchmark techniques (LSTM, GRU, and Bi-GRU) show that bi-directional gated recurrent units model performs better by capturing the dual directional load dependencies patterns present in the data.
-
The average prediction benefits achieved by the proposed approach are observed to be 50%. It clearly validates that the proposed approach accurately identifies and estimates the non-linear patterns present in the energy consumption dataset of all southern states. Hence, integrating Gaussian smoothing and CEEMDAN into deep neural models can be considered a viable and accurate solution to improve the prediction results of models developed for different application domains.
Visualization of prediction results
Discussion
Conclusion
-
From the prediction plots shown in Fig. 4, it is evident that the inclusion of data smoothing and decomposition strategy to the proposed approach enables effective and accurate capturing of randomness and variations in the load time-series data. This resolves a significant drawback of the existing research studies in the targeted domain.
-
The comparative evaluation with state-of-the-art load-series prediction methods clearly demonstrates the prediction accuracy, reliability and robustness of the proposed approach. From the prediction results, it is evident that the proposed approach outperforms traditional deep learning-based prediction models by providing reduction in the prediction error. These performance benefits are achieved by the combined benefits of data decomposition and bi-direction attention mechanism employed in the proposed approach.