nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

2. Data Preprocessing Techniques

verfasst von : Jun Zhao, Wei Wang, Chunyang Sheng

Erschienen in: Data-Driven Prediction for Industrial Processes and Their Applications

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

It is hard for raw industrial data accumulated by commonly implemented supervisory control and data acquisition (SCADA) system on-site to be directly employed to construct a prediction model, given that such data are always mixed with high level noise, missing points, and outliers due to the possible real-time database malfunction, data transformation, or maintenance. Thereby, the data preprocessing techniques have to be implemented, which usually contain anomaly data detection, data imputation, and data de-noising techniques. As for the issue of outliers, in this chapter, we introduce the anomaly detection methods based on fuzzy C means (FCM), K-nearest-neighbor (KNN), and dynamic time warping (DTW) algorithms. To tackle the missing data points problem, a series of data imputation methods are also described. After introducing the generic regression filling and expectation maximum methods, we supplement a varied window similarity measure method, the segmented shape-representation-based method, and the non-equal-length granules correlation method for industrial data imputation. With respect to the high level noise embodied in raw data, we then give an introduction to the well-known empirical mode decomposition (EMD) method. To verify the effectiveness of these methods, this chapter also provides a number of industrial case studies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Introduction

Nächstes Kapitel Industrial Time Series Prediction

Keogh, E. (2005). Recent advances in mining time series data. knowledge discovery in databases: Pkdd 2005. European Conference on Principles and Practice of Knowledge Discovery in Databases (p. 6), Porto, Portugal, October 3–7, 2005, Proceedings. DBLP.

Adamo, J. M. (2001). Data mining for association rules and sequential patterns. Berlin: Springer.CrossRef

Pyle, D. (1999). Data preparation for data mining (pp. 375–381). San Francisco: Morgan Kaufmann.

Kotsiantis, S. B., Kanellopoulos, D., & Pintelas, P. E. (2006). Data preprocessing for supervised leaning. International Journal of Computer Science, 1(2), 111–117.

Alpaydin, E. (2014). Introduction to machine learning. Cambridge: MIT press.MATH

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37.

Gama, J. (2010). Knowledge discovery from data streams. London: CRC Press.CrossRef

Liu, H., & Motoda, H. (1998). Feature extraction, construction and selection: a data mining perspective. Boston: Kluwer Academic Publishers.CrossRef

Chen, M., & Chen, L. (2008). An information granulation based data mining approach for classifying imbalanced data. Information Sciences, 178, 3214–3227.CrossRef

10.

Zhao, J., Liu, K., Wang, W., et al. (2014). Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry. Information Sciences, 259, 335–345.CrossRef

11.

Akouemo, H. N., & Povinelli, R. J. (2014). Time series outlier detection and imputation. PES General Meeting | Conference & Exposition (pp. 1–5), 2014 IEEE. IEEE.

12.

Aydilek, I. B., & Arslan, A. (2013). A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Information Sciences, 233, 25–35.CrossRef

13.

Fu, T. C. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1), 164–181.CrossRef

14.

Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science, 304(5667), 78–80.CrossRef

15.

Eftekhar, A., Toumazou, C., & Drakakis, E. M. (2013). Empirical mode decomposition: Real-time implementation and applications. Journal of Signal Processing Systems, 73(1), 43–58.CrossRef

16.

Monard, M. C. (2002). A study of K-nearest neighbour as an imputation method. DBLP (pp. 251–260).

17.

Steinbach, M., Karypis, G., & Kumar, V. (2000, August 20–23). A comparison of document clustering techniques. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and DataMining (pp. 174–181). Boston, MA, USA.

18.

Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.CrossRef

19.

Pal, N. R., & Bezdek, J. C. (2002). On cluster validity for the fuzzy c-means model. IEEE Transactions on Fuzzy Systems, 3(3), 370–379.CrossRef

20.

Chiang, J. H., & Hao, P. Y. (2003). A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Transactions on Fuzzy Systems, 11(4), 518–527.CrossRef

21.

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (pp. 87–88).CrossRef

22.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. London: Cambridge University press.

23.

Dempster, A. P., Laird, N. M., & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B: Methodological, 1977, 1–38.

24.

Rancourt, E., Särndal, C. E., & Lee, H. (1994). Estimation of the variance in the presence of nearest neighbor imputation. Proceedings of the Section on Survey Research Methods (pp. 888–893).

25.

Buschman, T. J., & Miller, E. K. (2007). Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science, 315(5820), 1860–1862.CrossRef

26.

Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimal object detection. Computer Society Conference on IEEE, 2, 2049–2056.

27.

Lu, K. F., Lin, S. K., & Qiao, J. Z. (2008). FSMBO: fast time series similarity matching based on bit operation. Proceedings of the 9th International Conference for Young Computer Scientists.

28.

Lv, Z., Zhao, J., Liu, Y., et al. (2016). Data imputation for gas flow data in steel industry based on non-equal-length granules correlation coefficient. Information Sciences, 367, 311–323.CrossRef

29.

Rilling, G., Flandrin, P., & Goncalves, P. (2003). On empirical mode decomposition and its algorithms. IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (vol. 3, pp. 8–11). IEEER, Grado, Italy.

30.

Kountouriotis, P. A., Obradovic, D., Goh, S. L., & Mandic, D. P. (2005). Multi-step forecasting using echo state networks. In: Proceedings of International Conference on Computer as a Tool (pp. 1574–1577). Belgrade, IEEE.

31.

Shi, Z. W., & Han, M. (2007). Ridge regression learning in ESN for chaotic time series prediction. Control and Decision, 22(3), 258–267.MathSciNetMATH

Titel: Data Preprocessing Techniques
verfasst von: Jun Zhao
Wei Wang
Chunyang Sheng
Verlag: Springer International Publishing
Buch: Data-Driven Prediction for Industrial Processes and Their Applications
Print ISBN: 978-3-319-94050-2

Electronic ISBN: 978-3-319-94051-9

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-94051-9_2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"