Skip to main content
Top
Published in: Neural Computing and Applications 6/2021

23-06-2020 | Original Article

NaNOD: A natural neighbour-based outlier detection algorithm

Authors: Abdul Wahid, Chandra Sekhara Rao Annavarapu

Published in: Neural Computing and Applications | Issue 6/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Outlier detection is an essential task in data mining applications which include, military surveillance, tax fraud detection, telecommunication, etc. In recent years, outlier detection received significant attention compared to other problem of discoveries. The focus on this has resulted in the growth of several outlier detection algorithms, mostly concerning the strategy based on distance or density. However, each strategy has intrinsic weaknesses. The distance-based techniques have the problem of local density, while the density-based method is recognized as having an issue of a low-density pattern. Also, most of the existing outlier detection algorithms have a parameter selection problem, which leads to poor detection results. In this article, we present an unsupervised density-based outlier detection algorithm to deal with these shortcomings. The proposed algorithm uses a Natural Neighbour (NaN) concept, to obtain a parameter called Natural Value (NV) adaptively, and a Weighted Kernel Density Estimation (WKDE) method to estimate the density at the location of an object. Besides, our proposed algorithm employed two different categories of nearest neighbours, k Nearest Neighbours (kNN), and Reverse Nearest Neighbours (RNN), which make our system flexible in modelling different data patterns. A Gaussian kernel function is adopted to achieve smoothness in the measure. Further, we use an adaptive kernel width concept to enhance the discrimination power between normal and outlier samples. The formal analysis and extensive experiments carried out on both artificial and real datasets demonstrate that this technique can achieve better outlier detection performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gladitz J, Barnett V, Lewis T (1988) Outliers in statistical data. Biom J 30(7):866–867 (john wiley & sons, chi-chester–new york–brisbane–toronto–singapore, 1984, xiv, 463 s., 26 abb.,£ 29.95, isbn 0471905070)CrossRef Gladitz J, Barnett V, Lewis T (1988) Outliers in statistical data. Biom J 30(7):866–867 (john wiley & sons, chi-chester–new york–brisbane–toronto–singapore, 1984, xiv, 463 s., 26 abb.,£ 29.95, isbn 0471905070)CrossRef
2.
go back to reference Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15CrossRef Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15CrossRef
3.
go back to reference Ramotsoela D, Abu-Mahfouz A, Hancke G (2018) A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study. Sensors 18(8):2491CrossRef Ramotsoela D, Abu-Mahfouz A, Hancke G (2018) A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study. Sensors 18(8):2491CrossRef
4.
go back to reference Kirlidog M, Asuk C (2012) A fraud detection approach with data mining in health insurance. Proc Soc Behav Sci 62:989–994CrossRef Kirlidog M, Asuk C (2012) A fraud detection approach with data mining in health insurance. Proc Soc Behav Sci 62:989–994CrossRef
5.
go back to reference Andrysiak T (2020) Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms. Neural Comput Appl 32(5):1269–1285CrossRef Andrysiak T (2020) Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms. Neural Comput Appl 32(5):1269–1285CrossRef
6.
go back to reference Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng SE-13(2):222–232CrossRef Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng SE-13(2):222–232CrossRef
7.
go back to reference Wang B, Mao Z (2020) Detecting outliers in industrial systems using a hybrid ensemble scheme. Neural Comput Appl 32(12):8047–8063CrossRef Wang B, Mao Z (2020) Detecting outliers in industrial systems using a hybrid ensemble scheme. Neural Comput Appl 32(12):8047–8063CrossRef
8.
go back to reference Ngai EW, Hu Y, Wong YH, Chen Y, Sun X (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50(3):559–569CrossRef Ngai EW, Hu Y, Wong YH, Chen Y, Sun X (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50(3):559–569CrossRef
9.
go back to reference Chan KY, Kwong C, Fogarty TC (2010) Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers. Inf Sci 180(4):506–518MathSciNetCrossRef Chan KY, Kwong C, Fogarty TC (2010) Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers. Inf Sci 180(4):506–518MathSciNetCrossRef
10.
go back to reference Barnett V, Lewis T (1974) Outliers in statistical data. Wiley, ChichesterMATH Barnett V, Lewis T (1974) Outliers in statistical data. Wiley, ChichesterMATH
11.
go back to reference Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126CrossRef Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126CrossRef
12.
go back to reference Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: ACM sigmod record, Vol. 29, ACM, pp 93–104 Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: ACM sigmod record, Vol. 29, ACM, pp 93–104
13.
go back to reference Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM International Conference on data mining, SIAM, pp 542–550 Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM International Conference on data mining, SIAM, pp 542–550
14.
go back to reference Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180CrossRef Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180CrossRef
15.
go back to reference Vázquez FI, Zseby T, Zimek A (2018) Outlier detection based on low density models. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 970–979 Vázquez FI, Zseby T, Zimek A (2018) Outlier detection based on low density models. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 970–979
16.
go back to reference Xie J, Xiong Z, Dai Q, Wang X, Zhang Y (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-Based Syst 192:105331CrossRef Xie J, Xiong Z, Dai Q, Wang X, Zhang Y (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-Based Syst 192:105331CrossRef
17.
go back to reference Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77CrossRef Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77CrossRef
18.
go back to reference Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190–237MathSciNetCrossRef Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190–237MathSciNetCrossRef
19.
go back to reference Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36CrossRef Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36CrossRef
20.
go back to reference Tang J, Chen Z, Fu AW-C, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 535–548 Tang J, Chen Z, Fu AW-C, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 535–548
21.
go back to reference Jin W, Tung AK, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 577–593 Jin W, Tung AK, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 577–593
22.
go back to reference Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp 61–75 Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp 61–75
23.
go back to reference Gao J, Hu W, Zhang ZM, Zhang X, Wu O (2011) Rkof: robust kernel-based local outlier detection. In: Pacific-Asia Conference on knowledge discovery and data mining, Springer, pp 270–283 Gao J, Hu W, Zhang ZM, Zhang X, Wu O (2011) Rkof: robust kernel-based local outlier detection. In: Pacific-Asia Conference on knowledge discovery and data mining, Springer, pp 270–283
24.
go back to reference Li J-B, Pan J-S, Lu Z-M (2009) Kernel optimization-based discriminant analysis for face recognition. Neural Comput Appl 18(6):603–612CrossRef Li J-B, Pan J-S, Lu Z-M (2009) Kernel optimization-based discriminant analysis for face recognition. Neural Comput Appl 18(6):603–612CrossRef
25.
go back to reference Pan J-S, Li J-B, Lu Z-M (2008) Adaptive quasiconformal kernel discriminant analysis. Neurocomputing 71(13–15):2754–2760CrossRef Pan J-S, Li J-B, Lu Z-M (2008) Adaptive quasiconformal kernel discriminant analysis. Neurocomputing 71(13–15):2754–2760CrossRef
26.
go back to reference Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRef Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRef
27.
go back to reference Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl-Based Syst 139:50–63CrossRef Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl-Based Syst 139:50–63CrossRef
28.
go back to reference Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Boca RatonCrossRef Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Boca RatonCrossRef
29.
go back to reference Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM Sigmod record, Vol. 29, ACM, pp. 427–438 Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM Sigmod record, Vol. 29, ACM, pp. 427–438
30.
go back to reference Hautamaki V, Karkkainen I, Franti P (2004) Outlier detection using k-nearest neighbour graph. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Vol. 3, IEEE, pp 430–433 Hautamaki V, Karkkainen I, Franti P (2004) Outlier detection using k-nearest neighbour graph. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Vol. 3, IEEE, pp 430–433
31.
go back to reference Ha J, Seok S, Lee J-S (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23CrossRef Ha J, Seok S, Lee J-S (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23CrossRef
32.
go back to reference Kriegel H-P, Kroger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. In: Proceedings of the 2011 SIAM International Conference on Data Mining, SIAM, pp 13–24 Kriegel H-P, Kroger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. In: Proceedings of the 2011 SIAM International Conference on Data Mining, SIAM, pp 13–24
33.
go back to reference Lee J-S, Olafsson S (2013) A meta-learning approach for determining the number of clusters with consideration of nearest neighbors. Inf Sci 232:208–224MathSciNetCrossRef Lee J-S, Olafsson S (2013) A meta-learning approach for determining the number of clusters with consideration of nearest neighbors. Inf Sci 232:208–224MathSciNetCrossRef
Metadata
Title
NaNOD: A natural neighbour-based outlier detection algorithm
Authors
Abdul Wahid
Chandra Sekhara Rao Annavarapu
Publication date
23-06-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 6/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05068-2

Other articles of this Issue 6/2021

Neural Computing and Applications 6/2021 Go to the issue

Premium Partner