Skip to main content
Log in

Detecting outliers in industrial systems using a hybrid ensemble scheme

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The massive growth of process data in industrial systems has promoted the development of data-driven techniques, while the presence of outliers in process data always deteriorates the effectiveness. This paper focuses on detecting outliers in industrial systems under the assumption that no labeled training data are available. Our method is on the basis of ensemble learning, and the base learners include both one-class classifiers and multi-class classifiers. The core idea is that one-class classifier ensemble model is used to address the problem of missing label, and the usage of multi-class classifier ensemble model is to further improve its performance when outlier examples are available. The essential motivation for this proposal is that results from a classifier trained using only positive data will not be as good as the results using both positive and negative data. We investigate the performance of the proposed scheme with a series of experiments. Ten benchmark data sets and two real-world industrial systems are used, and the results approve the performance of our detection scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://prtools.org/.

  2. https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/intelligent-systems/pattern-recognition-bioinformatics/pattern-recognition-laboratory/data-and-software/dd-tools/.

  3. http://archive.ics.uci.edu/ml/index.php.

  4. www.keel.es.

References

  1. Wang Z et al (2015) Incremental multiple instance outlier detection. Neural Comput Appl 26(4):957–968

    Article  Google Scholar 

  2. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58

    Article  Google Scholar 

  3. Chen PY, Yang S, Mccann JA (2014) Distributed real-time anomaly detection in networked industrial sensing systems. IEEE Trans Ind Electron 62(6):1–1

    Google Scholar 

  4. Liu F, Mao Z, Su W (2012) Outlier detection for process control data based on a non-linear auto-regression hidden Markov model method. Trans Inst Meas Control 34(5):527–538

    Article  Google Scholar 

  5. Schuster F, Paul A, König H (2013) Towards learning normality for anomaly detection in industrial control networks. Springer, Berlin, pp 61–72

    Google Scholar 

  6. Zhao J et al (2014) Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry. Inf Sci 259(3):335–345

    Article  Google Scholar 

  7. Ferdowsi H, Jagannathan S, Zawodniok M (2014) An online outlier identification and removal scheme for improving fault detection performance. IEEE Trans Neural Netw Learn Syst 25(5):908–919

    Article  Google Scholar 

  8. Wang B, Mao Z, Huang K (2017) Detecting outliers in complex nonlinear systems controlled by predictive control strategy. Chaos Solitons Fractals 103:588–595

    Article  Google Scholar 

  9. Wang B, Mao Z (2018) Detecting outliers in electric arc furnace under the condition of unlabeled, imbalanced, non-stationary and noisy data. Meas Control 51(3–4):83–93

    Article  Google Scholar 

  10. Wang B, Mao Z (2017) One-class classifiers ensemble based anomaly detection scheme for process control systems. Trans Inst Meas Control 40(12):3466–3476

    Article  Google Scholar 

  11. Cabral GG, Oliveira ALI, Cahú CBG (2009) Combining nearest neighbor data description and structural risk minimization for one-class classification. Neural Comput Appl 18(2):175–183

    Article  Google Scholar 

  12. Wang J et al (2017) Dynamic hypersphere SVDD without describing boundary for one-class classification. Neural Comput Appl 3:1–11

    Google Scholar 

  13. Cordón O, Jesus MJD, Herrera F (1999) A proposal on reasoning methods in fuzzy rule-based classification systems. Int J Approx Reason 20(1):21–45

    Article  Google Scholar 

  14. Shlien S (1990) Multiple binary decision tree classifiers. Pattern Recognit 23(7):757–763

    Article  Google Scholar 

  15. Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2(3):321–355

    MathSciNet  MATH  Google Scholar 

  16. Rivas VM et al (2004) Evolving RBF neural networks for time-series forecasting with EvRBF. Inf Sci 165(3):207–220

    Article  MathSciNet  Google Scholar 

  17. Vapnik V, Cortes C (1995) Support vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  18. Scholkopf B et al (2000) New support vector algorithms. Neural Comput 12(5):1207–1245

    Article  Google Scholar 

  19. Sesmero MP et al (2012) A new artificial neural network ensemble based on feature selection and class recoding. Neural Comput Appl 21(4):771–783

    Article  Google Scholar 

  20. Tian J, Gu H, Liu W (2011) Imbalanced classification using support vector machine ensemble. Neural Comput Appl 20(2):203–209

    Article  Google Scholar 

  21. Ge S et al (2016) Dynamic Clustering Forest: an ensemble framework to efficiently classify textual data stream with concept drift. Inf Sci 357:125–143

    Article  MATH  Google Scholar 

  22. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  23. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  24. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  25. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  26. Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2(1):139–154

    MATH  Google Scholar 

  27. Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    Article  MATH  Google Scholar 

  28. Tax DMJ (2001) One-class classification (concept-learning in the absence of counter-examples). Delft University of Technology, Delft

    Google Scholar 

  29. Haijun Z et al (2011) Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw 22(10):1532–1546

    Article  Google Scholar 

  30. Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Sixth international conference on data mining

  31. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  32. Parhizkar E, Abadi M (2015) BeeOWA: a novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381

    Article  Google Scholar 

  33. Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern A Syst Hum 41(3):552–568

    Article  Google Scholar 

  34. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  35. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  36. Li L, Mao Z (2012) A direct adaptive controller for EAF electrode regulator system using neural networks. Neurocomputing 82(4):91–98

    Article  Google Scholar 

  37. Chiang LH, Pell RJ, Seasholtz MB (2003) Exploring process data with the use of robust outlier detection algorithms. J Process Control 13(5):437–449

    Article  Google Scholar 

  38. Schölkopf B et al (2014) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  MathSciNet  MATH  Google Scholar 

  39. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    MATH  Google Scholar 

  40. Wang X, Yuan P, Mao Z (2015) Ensemble fixed-size LS-SVMs applied for the Mach number prediction in transonic wind tunnel. IEEE Trans Aerosp Electron Syst 51(4):3167–3181

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 51634002 and 61702070) and National Key R & D Program of China (Grant No. 2017YFB0304104).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhizhong Mao.

Ethics declarations

Conflict of interest

No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Mao, Z. Detecting outliers in industrial systems using a hybrid ensemble scheme. Neural Comput & Applic 32, 8047–8063 (2020). https://doi.org/10.1007/s00521-019-04307-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04307-5

Keywords

Navigation