Skip to main content
Erschienen in: Soft Computing 20/2017

08.07.2016 | Focus

Streaming data anomaly detection method based on hyper-grid structure and online ensemble learning

verfasst von: Zhiguo Ding, Minrui Fei, Dajun Du, Fan Yang

Erschienen in: Soft Computing | Ausgabe 20/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a novel online streaming data anomaly detection method. By using the new method, the improved \(L_{1}\) detection neighbor region optimizes the initial hyper-grid-based anomaly detection method by decreasing the quantity of neighbor detection region, and online ensemble learning adapts to the distribution evolving characteristic of streaming data and overcomes the difficulty of obtaining the optimal hyper-grid structure. To validate the proposed method, the paper uses a real-world dataset and two simulated datasets and finds out that the experimental results are near to the optimal results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ando S, Thanomphongphan T, Seki Y, Suzuki E (2015) Ensemble anomaly detection from multi-resolution trajectory features. Data Min Knowl Discov 29:39–83MathSciNetCrossRef Ando S, Thanomphongphan T, Seki Y, Suzuki E (2015) Ensemble anomaly detection from multi-resolution trajectory features. Data Min Knowl Discov 29:39–83MathSciNetCrossRef
Zurück zum Zitat Angiulli F, Fassetti F (2009) Dolphin: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data (TKDD) 3:1–57CrossRef Angiulli F, Fassetti F (2009) Dolphin: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data (TKDD) 3:1–57CrossRef
Zurück zum Zitat Bifet A, Holmes G, Pfahringer B, Gavald R (2009a) Improving adaptive bagging methods for evolving data streams, advances in machine learning. Springer, Berlin, pp 23–37 Bifet A, Holmes G, Pfahringer B, Gavald R (2009a) Improving adaptive bagging methods for evolving data streams, advances in machine learning. Springer, Berlin, pp 23–37
Zurück zum Zitat Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavald R (2009b) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 139–148 Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavald R (2009b) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 139–148
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATH
Zurück zum Zitat Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM Sigmod Rec 29(2):93–104 Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM Sigmod Rec 29(2):93–104
Zurück zum Zitat Chang WC, Cho CW (2010) Online boosting for vehicle detection. IEEE Trans Syst Man Cybern Part B Cybern 40:892–902CrossRef Chang WC, Cho CW (2010) Online boosting for vehicle detection. IEEE Trans Syst Man Cybern Part B Cybern 40:892–902CrossRef
Zurück zum Zitat Di Martino F, Sessa S, Barillari UES, Barillari MR (2014) Spatio-temporal hotspots and application on a disease analysis case via GIS. Soft Comput 18:2377–2384CrossRef Di Martino F, Sessa S, Barillari UES, Barillari MR (2014) Spatio-temporal hotspots and application on a disease analysis case via GIS. Soft Comput 18:2377–2384CrossRef
Zurück zum Zitat Ding Z-G, Du D-J, Fei M-R (2015) An online anomaly detection method for stream data using isolation principle and statistic histogram. Int J Model Simul Sci Comput (IJMSSC) 6:1–22 Ding Z-G, Du D-J, Fei M-R (2015) An online anomaly detection method for stream data using isolation principle and statistic histogram. Int J Model Simul Sci Comput (IJMSSC) 6:1–22
Zurück zum Zitat Ding Z, Fei M (2013) An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. In: 3rd IFAC conference on intelligent control and automation science, ICONS 2013. IFAC Secretariat, Chengdu, pp 12–17 Ding Z, Fei M (2013) An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. In: 3rd IFAC conference on intelligent control and automation science, ICONS 2013. IFAC Secretariat, Chengdu, pp 12–17
Zurück zum Zitat Daneshpazhouh A, Sami A (2014) Entropy-based outlier detection using semi-supervised approach with few positive examples. Pattern Recognit Lett 49:77–84CrossRef Daneshpazhouh A, Sami A (2014) Entropy-based outlier detection using semi-supervised approach with few positive examples. Pattern Recognit Lett 49:77–84CrossRef
Zurück zum Zitat Desir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46:3490–3506CrossRef Desir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46:3490–3506CrossRef
Zurück zum Zitat Dietterich TG (1997) Machine-learning research—four current directions. AI Mag 18:97–136 Dietterich TG (1997) Machine-learning research—four current directions. AI Mag 18:97–136
Zurück zum Zitat Esmaeili M, Almadan A (2011) Stream data mining and anomaly detection. Int J Comput Appl 34:38–41 Esmaeili M, Almadan A (2011) Stream data mining and anomaly detection. Int J Comput Appl 34:38–41
Zurück zum Zitat Fern A, Givan R (2003) Online ensemble learning: an empirical study. Mach Learn 53:71–109CrossRefMATH Fern A, Givan R (2003) Online ensemble learning: an empirical study. Mach Learn 53:71–109CrossRefMATH
Zurück zum Zitat Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Rec 34:18–26CrossRefMATH Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Rec 34:18–26CrossRefMATH
Zurück zum Zitat Gil P, Santos A, Cardoso A (2014) Dealing with outliers in wireless sensor networks: an oil refinery application. IEEE Trans Control Syst Technol 23:1589–1596 Gil P, Santos A, Cardoso A (2014) Dealing with outliers in wireless sensor networks: an oil refinery application. IEEE Trans Control Syst Technol 23:1589–1596
Zurück zum Zitat Gomez J, Gil C, Banos R, Marquez AL, Montoya FG, Montoya MG (2013) A Pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263CrossRef Gomez J, Gil C, Banos R, Marquez AL, Montoya FG, Montoya MG (2013) A Pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263CrossRef
Zurück zum Zitat Gupta M, Gao J, Aggarwal CC, Han JW (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26:2250–2267CrossRefMATH Gupta M, Gao J, Aggarwal CC, Han JW (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26:2250–2267CrossRefMATH
Zurück zum Zitat He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, IEEE World congress on computational intelligence. IEEE, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, IEEE World congress on computational intelligence. IEEE, pp 1322–1328
Zurück zum Zitat He H, Chen S, Li K, Xu X (2011) Incremental learning from stream data. IEEE Trans Neural Netw Learn Syst 22:1901–1914CrossRef He H, Chen S, Li K, Xu X (2011) Incremental learning from stream data. IEEE Trans Neural Netw Learn Syst 22:1901–1914CrossRef
Zurück zum Zitat Huang C-W, Lin K-P, Wu M-C, Hung K-C, Liu G-S, Jen C-H (2015) Intuitionistic fuzzy c-means clustering algorithm with neighborhood attraction in segmenting medical image. Soft Comput 19:459–470CrossRef Huang C-W, Lin K-P, Wu M-C, Hung K-C, Liu G-S, Jen C-H (2015) Intuitionistic fuzzy c-means clustering algorithm with neighborhood attraction in segmenting medical image. Soft Comput 19:459–470CrossRef
Zurück zum Zitat Huang H, Yoo S, Qin H, Yu DT (2014) Physics-based anomaly detection defined on manifold space. ACM Trans Knowl Discov Data 9:1–39CrossRef Huang H, Yoo S, Qin H, Yu DT (2014) Physics-based anomaly detection defined on manifold space. ACM Trans Knowl Discov Data 9:1–39CrossRef
Zurück zum Zitat Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253CrossRef Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253CrossRef
Zurück zum Zitat Kolter JZ, Maloof MA (2007) Dynamic weighted majority: a new ensemble method for tracking concept drift. J Mach Learn Res 8:2755–2790MATH Kolter JZ, Maloof MA (2007) Dynamic weighted majority: a new ensemble method for tracking concept drift. J Mach Learn Res 8:2755–2790MATH
Zurück zum Zitat Lee YJ, Yeh YR, Wang YCF (2013) Anomaly detection via online oversampling principal component analysis. IEEE Trans Knowl Data Eng 25:1460–1470CrossRef Lee YJ, Yeh YR, Wang YCF (2013) Anomaly detection via online oversampling principal component analysis. IEEE Trans Knowl Data Eng 25:1460–1470CrossRef
Zurück zum Zitat Limthong K, Fukuda K, Ji YS, Yamada S (2014) Unsupervised learning model for real-time anomaly detection in computer networks. IEICE Trans Inf Syst E 97D:2084–2094CrossRef Limthong K, Fukuda K, Ji YS, Yamada S (2014) Unsupervised learning model for real-time anomaly detection in computer networks. IEICE Trans Inf Syst E 97D:2084–2094CrossRef
Zurück zum Zitat Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6:1–39CrossRef Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6:1–39CrossRef
Zurück zum Zitat Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24:619–633CrossRef Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24:619–633CrossRef
Zurück zum Zitat Moshtaghi M, Havens TC, Bezdek JC, Park L, Leckie C, Rajasegarar S, Keller JM, Palaniswami M (2011) Clustering ellipses for anomaly detection. Pattern Recognit 44:55–69CrossRefMATH Moshtaghi M, Havens TC, Bezdek JC, Park L, Leckie C, Rajasegarar S, Keller JM, Palaniswami M (2011) Clustering ellipses for anomaly detection. Pattern Recognit 44:55–69CrossRefMATH
Zurück zum Zitat Noto K, Brodley C, Slonim D (2012) FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Discov 25:109–133MathSciNetCrossRef Noto K, Brodley C, Slonim D (2012) FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Discov 25:109–133MathSciNetCrossRef
Zurück zum Zitat Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics. IEEE, pp 2340–2345 Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics. IEEE, pp 2340–2345
Zurück zum Zitat O’Reilly C, Gluhak A, Imran MA, Rajasegarar S (2014) Anomaly detection in wireless sensor networks in a non-stationary environment. IEEE Commun Surv Tutor 16:1413–1432 O’Reilly C, Gluhak A, Imran MA, Rajasegarar S (2014) Anomaly detection in wireless sensor networks in a non-stationary environment. IEEE Commun Surv Tutor 16:1413–1432
Zurück zum Zitat Palshikar GK (2005) Distance-based outliers in sequences. In: Chakraborty G (ed) Distributed computing and internet technology, proceedings. Springer, Berlin, pp 547–552CrossRef Palshikar GK (2005) Distance-based outliers in sequences. In: Chakraborty G (ed) Distributed computing and internet technology, proceedings. Springer, Berlin, pp 547–552CrossRef
Zurück zum Zitat Qi ZQ, Xu YT, Wang LS, Song Y (2011) Online multiple instance boosting for object detection. Neurocomputing 74:1769–1775CrossRef Qi ZQ, Xu YT, Wang LS, Song Y (2011) Online multiple instance boosting for object detection. Neurocomputing 74:1769–1775CrossRef
Zurück zum Zitat Quinn JA, Sugiyama M (2014) A least-squares approach to anomaly detection in static and sequential data. Pattern Recognit Lett 40:36–40CrossRef Quinn JA, Sugiyama M (2014) A least-squares approach to anomaly detection in static and sequential data. Pattern Recognit Lett 40:36–40CrossRef
Zurück zum Zitat Sagha H, Bayati H, Mill JDR, Chavarriaga R (2013) On-line anomaly detection and resilience in classifier ensembles. Pattern Recognit Lett 34:1916–1927CrossRef Sagha H, Bayati H, Mill JDR, Chavarriaga R (2013) On-line anomaly detection and resilience in classifier ensembles. Pattern Recognit Lett 34:1916–1927CrossRef
Zurück zum Zitat Salem O, Liu YN, Mehaoua A, Boutaba R (2014) Online anomaly detection in wireless body area networks for reliable healthcare monitoring. IEEE J Biomed Health Inform 18:1541–1551CrossRef Salem O, Liu YN, Mehaoua A, Boutaba R (2014) Online anomaly detection in wireless body area networks for reliable healthcare monitoring. IEEE J Biomed Health Inform 18:1541–1551CrossRef
Zurück zum Zitat Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13:1443–1471CrossRefMATH Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13:1443–1471CrossRefMATH
Zurück zum Zitat Segui S, Igual L, Vitria J (2013) Bagged one-class classifiers in the presence of outliers. Int J Pattern Recognit Artif Intell 27:1–21CrossRef Segui S, Igual L, Vitria J (2013) Bagged one-class classifiers in the presence of outliers. Int J Pattern Recognit Artif Intell 27:1–21CrossRef
Zurück zum Zitat Serdio F, Lughofer E, Pichler K, Buchegger T, Pichler M, Efendic H (2014) Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations. Inf Fusion 20:272–291CrossRef Serdio F, Lughofer E, Pichler K, Buchegger T, Pichler M, Efendic H (2014) Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations. Inf Fusion 20:272–291CrossRef
Zurück zum Zitat Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, pp 187–198 Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, pp 187–198
Zurück zum Zitat Suhailis A, Kadir A, Abu Bakar A, Hamdan AR (2014) Frequent positive and negative (FPN) itemset approach for outlier detection. Intell Data Anal 18:1049–1065 Suhailis A, Kadir A, Abu Bakar A, Hamdan AR (2014) Frequent positive and negative (FPN) itemset approach for outlier detection. Intell Data Anal 18:1049–1065
Zurück zum Zitat Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: Proceedings of the twenty-second international joint conference on artificial intelligence. AAAI Press, pp 1511–1516 Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: Proceedings of the twenty-second international joint conference on artificial intelligence. AAAI Press, pp 1511–1516
Zurück zum Zitat Xie M, Hu J, Han S, Chen H (2012) Scalable hyper-grid k-NN-based online anomaly detection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 24:1661–1670CrossRef Xie M, Hu J, Han S, Chen H (2012) Scalable hyper-grid k-NN-based online anomaly detection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 24:1661–1670CrossRef
Zurück zum Zitat Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300 Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300
Zurück zum Zitat Yang X, Han L, Li Y, He L (2015) A bilateral-truncated-loss based robust support vector machine for classification problems. Soft Comput 19:2871–2882CrossRefMATH Yang X, Han L, Li Y, He L (2015) A bilateral-truncated-loss based robust support vector machine for classification problems. Soft Comput 19:2871–2882CrossRefMATH
Zurück zum Zitat Yu X, Tang LA, Han J (2009a) Filtering and refinement: a two-stage approach for efficient and effective anomaly detection. In: ICDM’09. Ninth IEEE international conference data mining. IEEE, pp 617–626 Yu X, Tang LA, Han J (2009a) Filtering and refinement: a two-stage approach for efficient and effective anomaly detection. In: ICDM’09. Ninth IEEE international conference data mining. IEEE, pp 617–626
Zurück zum Zitat Yu Y, Guo SQ, Lan S, Ban T (2009b) Anomaly intrusion detection for evolving data stream based on semi-supervised learning. Adv Neuro-Inf Process 5506:571–578 Yu Y, Guo SQ, Lan S, Ban T (2009b) Anomaly intrusion detection for evolving data stream based on semi-supervised learning. Adv Neuro-Inf Process 5506:571–578
Zurück zum Zitat Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12:159–170CrossRef Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12:159–170CrossRef
Zurück zum Zitat Zhou XZ, Li SP, Ye Z (2013) A novel system anomaly prediction system based on belief Markov model and ensemble classification. Math Probl Eng 2013:831–842 Zhou XZ, Li SP, Ye Z (2013) A novel system anomaly prediction system based on belief Markov model and ensemble classification. Math Probl Eng 2013:831–842
Metadaten
Titel
Streaming data anomaly detection method based on hyper-grid structure and online ensemble learning
verfasst von
Zhiguo Ding
Minrui Fei
Dajun Du
Fan Yang
Publikationsdatum
08.07.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 20/2017
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-016-2258-z

Weitere Artikel der Ausgabe 20/2017

Soft Computing 20/2017 Zur Ausgabe