Dynamic intelligent cleaning model of dirty electric load data

https://doi.org/10.1016/j.enconman.2007.08.007Get rights and content

Abstract

There are a number of dirty data in the load database derived from the supervisory control and data acquisition (SCADA) system. Thus, the data must be carefully and reasonably adjusted before it is used for electric load forecasting or power system analysis. This paper proposes a dynamic and intelligent data cleaning model based on data mining theory. Firstly, on the basis of fuzzy soft clustering, the Kohonen clustering network is improved to fulfill the parallel calculation of fuzzy c-means soft clustering. Then, the proposed dynamic algorithm can automatically find the new clustering center (the characteristic curve of the data) with the updated sample data; At last, it is composed with radial basis function neural network (RBFNN), and then, an intelligent adjusting model is proposed to identify the dirty data. The rapid and dynamic performance of the model makes it suitable for real time calculation, and the efficiency and accuracy of the model is proved by test results of electrical load data analysis in Chongqing.

Introduction

High accuracy of load forecasting for power systems improves the security of the power system and reduces generation costs. Load forecasting is highly related to power system operations such as dispatch scheduling, preventive maintenance plan for generators and reliability evaluation of the power systems. In addition, accurate estimated loads are key data that are necessary for electric power price forecast on the electric power markets. So far, many studies on load forecasting have been made to improve prediction accuracy using various conventional methods such as regression models, expert systems, artificial neural network, fuzzy inference and hybrid algorithm [1], [2], [3], [4], [5], [6], [7].

Because of transmission errors of the information channel, as well as the faults of the remote terminal unit (RTU) etc., the load data derived from the supervisory control and data acquisition (SCADA) has some dirty data. Direct use of these load data may have some negative effects on the accuracy of load forecasting, so it is necessary to identify and to adjust these dirty data, which is an important step of data mining [8].

So far, various methods have been proposed to identify and to adjust the dirty data, but there is still no systematic method that can solve this problem effectively all around. Sequential probabilistic ratio analysis is used as outliers detection tools for stationary time series [9], but this method requires relative information about the data set parameters, such as data distribution, which is yet unknown in many cases. Learning vector quantization (LVQ) has been used to get rid of dirty data in Ref. [10]. This method regards data as vector array. If one element in a vector is dirty data, the whole vector is eliminated. Because it cannot identify the exact location of the dirty data, a great deal of useful information will be lost at the same time.

In this paper, a dynamic and intelligent model that has three layers based on data mining theory is proposed. The first layer extracts the characteristic curve from the load using the Kohonen clustering network improved by the fuzzy soft clustering algorithm. In the second layer, a radial basis function neural network (RBFNN) is used to construct a pattern classifier for identifying dirty data. In the third layer, the value of the dirty data is replaced by the weighted sum of the corresponding two values in the same place in two characteristic curves with maximal membership grade.

According to the updated sample data, the proposed dynamic clustering algorithm can automatically search new vectors, namely, the characteristic curve. This model fills up deficiencies of the methods mentioned in the above references, and it owns many advantages, such as high accuracy, real time and dynamic state. What’s more, the efficiency and accuracy of the model is proved by test results of electrical load data analysis in Chongqing.

Section snippets

Principle and structure of intelligent adjusting model of dirty data

Similarity and smoothness are the two important characteristics of electrical load curves. The several peak times in a daily curve are generally the same, and the neighboring points usually have little variation, while the existence of dirty data will obviously destroy the smoothness. However, the similarity remains unchanged because the amount of dirty data is small. Therefore, characteristic patterns can be extracted from many load curves that may contain dirty data using the clustering

The analysis of results

Data in workday and weekend are put into the FKCN, respectively, because these two kinds of load curves are obviously different. This operation reduces the amount of training calculation and the number of clustering centers and increases the calculation speed and improves the efficiency of the model. The following example is derived from electrical load data from April to September 2003 of the Jiangbei power supply bureau in Chongqing, China.

Conclusion

The analysis of examples illuminates that the FKCN algorithm improves the capability of Kohonen clustering networks and can obtain the clustering center more quickly and reasonably, overcoming the disadvantages of the Kohonen algorithm. The proposed dynamic updating algorithm can adjust the clustering center automatically on the basis of the newly added data, and the RBF networks can identify the exact location of dirty data because of its strong ability of pattern recognition. The dynamic

References (21)

  • E.C. Tsao et al.

    Fuzzy Kohonen clustering networks

    Pattern Recogn

    (1994)
  • S. Rahman et al.

    An expert system based algorithm for short term load forecast

    IEEE Trans Power Syst

    (1988)
  • H. Mori et al.

    Optimal fuzzy inference for short-term load forecasting

    IEEE Trans Power Syst

    (1996)
  • Kyung-Bin Song et al.

    Short-term load forecasting for the holidays using fuzzy linear regression method

    IEEE Trans Power Syst

    (2005)
  • K.H. Kim

    Development of fuzzy expert system for short-term load forecasting on special days

    IEEE Trans Power Syst

    (1998)
  • J. Nazarko et al.

    The fuzzy regression approach to peak load estimation in power distribution systems

    IEEE Trans Power Syst

    (1999)
  • W. Charytoniuk et al.

    Very short-term load forecasting using artificial neural networks

    IEEE Trans Power Syst

    (2000)
  • S.H. Ling et al.

    Short-term electric load forecasting based on a neural fuzzy network

    IEEE Trans Ind Electron

    (2003)
  • Kokyo Cho

    Outlier detection for stationary time series

    J Stat Plan Infer

    (2001)
There are more references available in the full text version of this article.

Cited by (0)

View full text