Skip to main content
Top

2021 | OriginalPaper | Chapter

Outlier Detection with the Use of Isolation Forests

Authors : Krzysztof Najman, Krystian Zieliński

Published in: Data Analysis and Classification

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Appropriate preparation of data for analysis is a key element in empirical research. Considering the source of data or the nature of the phenomenon studied, some observations may differ significantly from others. Inclusion of such cases in a research may seriously distort the profile of the population under examination. Nevertheless, their omission can be equally disadvantageous. When analyzing dynamically changing phenomena, especially in case of big data, a relatively small amount of outliers may constitute a coherent and internally homogeneous group, which, along with the registration of subsequent observations, may grow into an independent cluster. Whether or not an outlier is removed from the dataset, researcher must be first aware of its existence. For this purpose, an appropriate method of anomaly detection should be used. Identification of such units allows the researcher to make an appropriate decision regarding the further steps in the analysis.
Assessment of the usefulness of outlier value detection methods has been increasingly influenced by the possibility of their application for big data problems. The algorithms should be effective for large volume and diverse sets of data, which are additionally subject to constant changes. For these reasons, apart from high sensitivity, the following are also important: low computational time and the algorithm’s adaptability.
The aim of the research presented is to assess the usefulness of Isolation Forests in outlier detection. Properties of the algorithm, with its extensions, will be analyzed. The results of simulation and empirical research on selected datasets will be presented. The algorithm evaluation will take into account the impact of particular features of big datasets on the effectiveness of the methods analyzed.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Anscombe FJ (1973) Graphs in statistical analysis. Am Stat 27(1):17–21 Anscombe FJ (1973) Graphs in statistical analysis. Am Stat 27(1):17–21
go back to reference Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1):1–21CrossRef Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1):1–21CrossRef
go back to reference Liu Z, Liu X, Ma J, Gao H (2019) An optimized computational framework for isolation forest. Mathematical Problems in Engineering, vol 2018, Article ID 2318763 Liu Z, Liu X, Ma J, Gao H (2019) An optimized computational framework for isolation forest. Mathematical Problems in Engineering, vol 2018, Article ID 2318763
go back to reference Migdał-Najman K, Najman K (2013) Samouczące się sztuczne sieci neuronowe w grupowaniu i klasyfikacji danych. Teoria i zastosowania w ekonomii, Wydawnictwo Uniwersytetu Gdańskiego, Gdańsk Migdał-Najman K, Najman K (2013) Samouczące się sztuczne sieci neuronowe w grupowaniu i klasyfikacji danych. Teoria i zastosowania w ekonomii, Wydawnictwo Uniwersytetu Gdańskiego, Gdańsk
go back to reference Probst P, Boulesteix A-L (2018) To tune or not to tune the number of trees in random forest. J Mach Learn Res 18:10–18MathSciNetMATH Probst P, Boulesteix A-L (2018) To tune or not to tune the number of trees in random forest. J Mach Learn Res 18:10–18MathSciNetMATH
go back to reference Sahand H, Kind MC, Brunner RJ (2019) Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, pp 1–1. Crossref. Web Sahand H, Kind MC, Brunner RJ (2019) Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, pp 1–1. Crossref. Web
Metadata
Title
Outlier Detection with the Use of Isolation Forests
Authors
Krzysztof Najman
Krystian Zieliński
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-75190-6_5