Skip to main content
Erschienen in: The Journal of Supercomputing 3/2021

20.07.2020

An explainable outlier detection method using region-partition trees

verfasst von: Cheong Hee Park, Jiil Kim

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most outlier detection methods output outlier score that measures the degree of deviation of a data sample from a normal data pattern. However, it is difficult to choose an optimal threshold on outlier scores by which outliers and normal data samples can be distinguished. In this paper, we propose a tree-based outlier detection method which computes normalized outlier scores for data samples. In particular, without the need to determine the threshold for outlier score it provides binary labels for outlier prediction. By using training data which consists of normal data samples, the proposed method builds a multi-way splitting tree, called region-partition tree (RP-tree), where normal data region is effectively described by the partition of data region into leaf nodes. By utilizing region-partition table (RP-table) which stores the information for splitting attributes and interval partition, RP-tree can be constructed so as to finely split the normal data region but keep the size of a tree be reasonably small. From the ensemble of RP-trees, the proposed method computes the normalized outlier scores ranging in [0, 1] and data samples with outlier score of 1 are predicted as outliers. Also it identifies the attributes responsible for outlier prediction. Experimental results demonstrate the outlier detection performance of the proposed method. The proposed method obtained an average F1-value of 0.72 and an AUC score of 0.96, while the second highest performance in the compared methods was an F1-value of 0.57 and an AUC score of 0.94, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
3.
Zurück zum Zitat Chauhan S, Vig L (2015) Anomaly detection in ECG time series via deep long short-term memory metworks. In: Proceedings of DSAA Chauhan S, Vig L (2015) Anomaly detection in ECG time series via deep long short-term memory metworks. In: Proceedings of DSAA
4.
Zurück zum Zitat March E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional ISTM neural networks. In Proceedings of ICASSP March E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional ISTM neural networks. In Proceedings of ICASSP
5.
Zurück zum Zitat Yang J, Rahardja S, Franti P (2019) Outlier detection: how to threshold outlier scores?. In: Proceedings of the international conference on artificial intelligence, information processing and cloud computing Yang J, Rahardja S, Franti P (2019) Outlier detection: how to threshold outlier scores?. In: Proceedings of the international conference on artificial intelligence, information processing and cloud computing
6.
Zurück zum Zitat Liu F, Ting K, Zhou Z (2008) Isolation forest. In: Proceedings of the 8th international conference on data mining Liu F, Ting K, Zhou Z (2008) Isolation forest. In: Proceedings of the 8th international conference on data mining
7.
Zurück zum Zitat Wu K, Zhang K, Fan W, Edwards A, Yu P (2014) RS-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th international conference on data mining Wu K, Zhang K, Fan W, Edwards A, Yu P (2014) RS-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th international conference on data mining
9.
Zurück zum Zitat Park C (2019) Outlier and anomaly pattern detection on data streams. J Supercomput 75:6118–6128CrossRef Park C (2019) Outlier and anomaly pattern detection on data streams. J Supercomput 75:6118–6128CrossRef
10.
Zurück zum Zitat Knorr E, Ng R(1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th international conference on very large databases Knorr E, Ng R(1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th international conference on very large databases
11.
Zurück zum Zitat Breunig M, Kriegel H, Ng J, Sander R (2000) LOF: Identifying density-based local outliers. In: Proceedings of the 2000 ACM sigmod international conference on management of data Breunig M, Kriegel H, Ng J, Sander R (2000) LOF: Identifying density-based local outliers. In: Proceedings of the 2000 ACM sigmod international conference on management of data
12.
Zurück zum Zitat Jiang M, Tseng S, Su C (2001) Two-phase clustering process for outliers detection. Pattern recognition letters 22:691–700CrossRef Jiang M, Tseng S, Su C (2001) Two-phase clustering process for outliers detection. Pattern recognition letters 22:691–700CrossRef
13.
Zurück zum Zitat He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24:1641–1650CrossRef He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24:1641–1650CrossRef
14.
Zurück zum Zitat Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. In: Proceedings of the ICML Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. In: Proceedings of the ICML
15.
Zurück zum Zitat Wang H, Li X, Zhang T (2018) Generative adversarial network based novelty detection using minimized reconstruction error. Frontiers Inf Technol Electron Eng 19:116–125CrossRef Wang H, Li X, Zhang T (2018) Generative adversarial network based novelty detection using minimized reconstruction error. Frontiers Inf Technol Electron Eng 19:116–125CrossRef
16.
Zurück zum Zitat Zenati H, Romain M, Foo C, Lecouat B, Chandrasekhar V (2018) Adversarially learned anomaly detection. In: Proceedings of the ICDM Zenati H, Romain M, Foo C, Lecouat B, Chandrasekhar V (2018) Adversarially learned anomaly detection. In: Proceedings of the ICDM
17.
Zurück zum Zitat Alla S, Adari S (2019) Practical use cases of anomaly detection beginning anomaly detection using python-based deep learning. Apress, BerkeleyCrossRef Alla S, Adari S (2019) Practical use cases of anomaly detection beginning anomaly detection using python-based deep learning. Apress, BerkeleyCrossRef
18.
Zurück zum Zitat Susto G, Beghi A, McLoone S (2017) Anomaly detection through on-line isolation forest: an application to plasma etching. In: Proceedings of the 28th annual semi advanced semiconductor manufacturing conference (ASMC) Susto G, Beghi A, McLoone S (2017) Anomaly detection through on-line isolation forest: an application to plasma etching. In: Proceedings of the 28th annual semi advanced semiconductor manufacturing conference (ASMC)
19.
Zurück zum Zitat Ounacer S, Bour H, Oubrahim Y, Ghoumari M, Azzouazi M (2018) Using Isolation Forest in anomaly detection: the case of credit card transactions. Period Eng Nat Sci 6(2):394–400 Ounacer S, Bour H, Oubrahim Y, Ghoumari M, Azzouazi M (2018) Using Isolation Forest in anomaly detection: the case of credit card transactions. Period Eng Nat Sci 6(2):394–400
22.
Zurück zum Zitat Hawkins S, Hongxing H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of DaWaK Hawkins S, Hongxing H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of DaWaK
23.
Zurück zum Zitat Bife A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604 Bife A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
26.
Zurück zum Zitat Scholkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRef Scholkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRef
27.
Zurück zum Zitat Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH
Metadaten
Titel
An explainable outlier detection method using region-partition trees
verfasst von
Cheong Hee Park
Jiil Kim
Publikationsdatum
20.07.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03384-x

Weitere Artikel der Ausgabe 3/2021

The Journal of Supercomputing 3/2021 Zur Ausgabe