Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 4/2021

01-04-2021

Homophily outlier detection in non-IID categorical data

Authors: Guansong Pang, Longbing Cao, Ling Chen

Published in: Data Mining and Knowledge Discovery | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data by capturing non-IID outlier factors. Our approach first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation. It then models an outlierness propagation process in the value graph to learn the outlierness of feature values. The learned value outlierness allows for either direct outlier detection or outlying feature selection. The graph representation and mining approach is employed here to well capture the rich non-IID characteristics. Our empirical results on 15 real-world data sets with different levels of data complexities show that (i) the proposed outlier detection methods significantly outperform five state-of-the-art methods at the 95%/99% confidence level, achieving 10–28% AUC improvement on the 10 most complex data sets; and (ii) the proposed feature selection methods significantly outperform three competing methods in enabling subsequent outlier detection of two different existing detectors.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Having excessive thirsty, weight loss and frequent urination are the abnormal concurrent symptoms in diagnosing type-2 diabetes according to https://​www.​diabetesaustrali​a.​com.​au/​.
 
2
We have ignored features with \( freq (m)=1\), as those features contain no useful information relevant to outlier detection.
 
3
\(\delta \) is normalized into the range in (0,1) to work well with \(\eta \).
 
4
The source codes of CBRW/SDRW-based outlier detection (or feature selection) algorithms are made publicly available at https://​sites.​google.​com/​site/​gspangsite/​sourcecode.
 
5
Since CompreX was implemented in a different programming language to the other methods, the runtime between CompreX and other methods is incomparable. Instead, we compare them in terms of runtime ratio, i.e., the runtime on a larger/higher-dimensional data set divided by that on a smaller/lower-dimensional data set, for a fairer comparison. Since the data size and the increasing factor of dimensionality are fixed, the runtime ratio is comparable across the methods in different programming languages.
 
Literature
go back to reference Aggarwal CC (2017a) Outlier detection in categorical, text, and mixed attribute data. In: Outlier analysis, pp 249–272. Springer, Berlin Aggarwal CC (2017a) Outlier detection in categorical, text, and mixed attribute data. In: Outlier analysis, pp 249–272. Springer, Berlin
go back to reference Akoglu L, Tong H, Vreeken J, Faloutsos C (2012) Fast and reliable anomaly detection in categorical data. In: CIKM, pp 415–424. ACM Akoglu L, Tong H, Vreeken J, Faloutsos C (2012) Fast and reliable anomaly detection in categorical data. In: CIKM, pp 415–424. ACM
go back to reference Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Disc 29(3):626–688MathSciNetCrossRef Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Disc 29(3):626–688MathSciNetCrossRef
go back to reference Andersen R, Chellapilla K (2009) Finding dense subgraphs with size bounds. In: Algorithms and models for the web-graph, pp 25–37 Andersen R, Chellapilla K (2009) Finding dense subgraphs with size bounds. In: Algorithms and models for the web-graph, pp 25–37
go back to reference Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Datab Syst 34(1):7 Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Datab Syst 34(1):7
go back to reference Angiulli F, Ben-Eliyahu-Zohary R, Palopoli L (2010) Outlier detection for simple default theories. Artif Intell 174(15):1247–1253MathSciNetMATHCrossRef Angiulli F, Ben-Eliyahu-Zohary R, Palopoli L (2010) Outlier detection for simple default theories. Artif Intell 174(15):1247–1253MathSciNetMATHCrossRef
go back to reference Azmandian F, Yilmazer A, Dy JG, Aslam J, Kaeli DR, et al (2012) GPU-accelerated feature selection for outlier detection using the local kernel density ratio. In ICDM, pp 51–60. IEEE Azmandian F, Yilmazer A, Dy JG, Aslam J, Kaeli DR, et al (2012) GPU-accelerated feature selection for outlier detection using the local kernel density ratio. In ICDM, pp 51–60. IEEE
go back to reference Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: SDM, pp 243–254. SIAM Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: SDM, pp 243–254. SIAM
go back to reference Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Record 29(2):93–104CrossRef Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Record 29(2):93–104CrossRef
go back to reference Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. ACM SIGMOD Record 26(2):265–276CrossRef Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. ACM SIGMOD Record 26(2):265–276CrossRef
go back to reference Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927MathSciNetCrossRef Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927MathSciNetCrossRef
go back to reference Cao L (2014) Non-iidness learning in behavioral and social data. Comput J 57(9):1358–1370CrossRef Cao L (2014) Non-iidness learning in behavioral and social data. Comput J 57(9):1358–1370CrossRef
go back to reference Cao L (2015) Coupling learning of complex interactions. Inf Process Manag 51(2):167–186CrossRef Cao L (2015) Coupling learning of complex interactions. Inf Process Manag 51(2):167–186CrossRef
go back to reference Cao L (2018) Data science thinking: the next scientific. Technological and Economic Revolution, Springer, Berlin Cao L (2018) Data science thinking: the next scientific. Technological and Economic Revolution, Springer, Berlin
go back to reference Cao L, Yuming O, Philip SY (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392CrossRef Cao L, Yuming O, Philip SY (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392CrossRef
go back to reference Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15CrossRef Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15CrossRef
go back to reference Chau DH, Nachenberg C, Wilhelm J, Wright A, Faloutsos C (2011) Polonium: Tera-scale graph mining and inference for malware detection. In: SDM, pp 131–142. SIAM Chau DH, Nachenberg C, Wilhelm J, Wright A, Faloutsos C (2011) Polonium: Tera-scale graph mining and inference for malware detection. In: SDM, pp 131–142. SIAM
go back to reference Das K, Schneider J (2007) Detecting anomalous records in categorical datasets. In: KDD, pp 220–229. ACM Das K, Schneider J (2007) Detecting anomalous records in categorical datasets. In: KDD, pp 220–229. ACM
go back to reference Emmott AF, Das S, Dietterich T, Fern A, Wong W-K (2013) Systematic construction of anomaly detection benchmarks from real data. In: KDD workshop, pp 16–21. ACM Emmott AF, Das S, Dietterich T, Fern A, Wong W-K (2013) Systematic construction of anomaly detection benchmarks from real data. In: KDD workshop, pp 16–21. ACM
go back to reference Fan X, Xu RYD, Cao L (2016) Copula mixed-membership stochastic blockmodel. In: IJCAI, pp 1462–1468 Fan X, Xu RYD, Cao L (2016) Copula mixed-membership stochastic blockmodel. In: IJCAI, pp 1462–1468
go back to reference Fill JA (1991) Eigenvalue bounds on convergence to stationarity for nonreversible markov chains, with an application to the exclusion process. Ann Appl Probab 1(1):62–87MathSciNetMATHCrossRef Fill JA (1991) Eigenvalue bounds on convergence to stationarity for nonreversible markov chains, with an application to the exclusion process. Ann Appl Probab 1(1):62–87MathSciNetMATHCrossRef
go back to reference Fowler JH, Christakis NA (2008) Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. BMJ 337:a2338CrossRef Fowler JH, Christakis NA (2008) Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. BMJ 337:a2338CrossRef
go back to reference Ganiz MC, George C, Pottenger WM (2011) Higher order naive bayes: a novel non-iid approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034CrossRef Ganiz MC, George C, Pottenger WM (2011) Higher order naive bayes: a novel non-iid approach to text classification. IEEE Trans Knowl Data Eng 23(7):1022–1034CrossRef
go back to reference Giacometti A, Soulet A (2016) Anytime algorithm for frequent pattern outlier detection. Int J Data Sci Anal, pp 1–12 Giacometti A, Soulet A (2016) Anytime algorithm for frequent pattern outlier detection. Int J Data Sci Anal, pp 1–12
go back to reference Gómez-Gardeñes J, Latora V (2008) Entropy rate of diffusion processes on complex networks. Phys Rev E 78(6):065102CrossRef Gómez-Gardeñes J, Latora V (2008) Entropy rate of diffusion processes on complex networks. Phys Rev E 78(6):065102CrossRef
go back to reference Guha S, Mishra N, Roy G, Schrijvers O (2016) Robust random cut forest based anomaly detection on streams. In: ICML, pp 2712–2721 Guha S, Mishra N, Roy G, Schrijvers O (2016) Robust random cut forest based anomaly detection on streams. In: ICML, pp 2712–2721
go back to reference Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data. Synth Lect Data Min Knowl Discov 5(1):1–129MATHCrossRef Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data. Synth Lect Data Min Knowl Discov 5(1):1–129MATHCrossRef
go back to reference Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRef
go back to reference Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186MATHCrossRef Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186MATHCrossRef
go back to reference He J (2017) Learning from data heterogeneity: algorithms and applications. In: IJCAI, pp 5126–5130 He J (2017) Learning from data heterogeneity: algorithms and applications. In: IJCAI, pp 5126–5130
go back to reference He J, Carbonell J (2010) Coselection of features and instances for unsupervised rare category analysis. Stat Anal Data Min 3(6):417–430MathSciNetMATHCrossRef He J, Carbonell J (2010) Coselection of features and instances for unsupervised rare category analysis. Stat Anal Data Min 3(6):417–430MathSciNetMATHCrossRef
go back to reference He Z, Xu X, Huang ZJ, Deng S (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118CrossRef He Z, Xu X, Huang ZJ, Deng S (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118CrossRef
go back to reference Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300CrossRef Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300CrossRef
go back to reference Ienco D, Pensa RG, Meo R (2017) A semisupervised approach to the detection and characterization of outliers in categorical data. IEEE Trans Neural Netw Learn Syst 28(5):1017–1029CrossRef Ienco D, Pensa RG, Meo R (2017) A semisupervised approach to the detection and characterization of outliers in categorical data. IEEE Trans Neural Netw Learn Syst 28(5):1017–1029CrossRef
go back to reference Jian S, Cao L, Pang G, Lu K, Gao H (2017) Embedding-based representation of categorical data by hierarchical value coupling learning. In: IJCAI, pp 1937–1943 Jian S, Cao L, Pang G, Lu K, Gao H (2017) Embedding-based representation of categorical data by hierarchical value coupling learning. In: IJCAI, pp 1937–1943
go back to reference Khuller S, Barna S (2009) On finding dense subgraphs. Automata, Languages and Programming, pp 597–608 Khuller S, Barna S (2009) On finding dense subgraphs. Automata, Languages and Programming, pp 597–608
go back to reference Koufakou A, Georgiopoulos M (2010) A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min Knowl Disc 20(2):259–289MathSciNetCrossRef Koufakou A, Georgiopoulos M (2010) A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min Knowl Disc 20(2):259–289MathSciNetCrossRef
go back to reference Koufakou A, Secretan J, Georgiopoulos M (2011) Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl Inf Syst 29(3):697–725CrossRef Koufakou A, Secretan J, Georgiopoulos M (2011) Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl Inf Syst 29(3):697–725CrossRef
go back to reference Koutra D, Ke T-Y, Kang U, Chau D, Pao H-K, Faloutsos C (2011) Unifying guilt-by-association approaches: theorems and fast algorithms. In: Machine learning and knowledge discovery in databases, pp 245–260 Koutra D, Ke T-Y, Kang U, Chau D, Pao H-K, Faloutsos C (2011) Unifying guilt-by-association approaches: theorems and fast algorithms. In: Machine learning and knowledge discovery in databases, pp 245–260
go back to reference Leyva E, González A, Perez R (2015) A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans Knowl Data Eng 27(2):354–367CrossRef Leyva E, González A, Perez R (2015) A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans Knowl Data Eng 27(2):354–367CrossRef
go back to reference Liang J, Parthasarathy S (2016) Robust contextual outlier detection: Where context meets sparsity. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 2167–2172. ACM Liang J, Parthasarathy S (2016) Robust contextual outlier detection: Where context meets sparsity. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 2167–2172. ACM
go back to reference Liu FT, Ting KM, Zhou Z-H (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6(1):3:1–3:39CrossRef Liu FT, Ting KM, Zhou Z-H (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6(1):3:1–3:39CrossRef
go back to reference Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246CrossRef Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246CrossRef
go back to reference McGlohon M, Bay S, Anderle MG, Steier DM, Faloutsos C (2009) SNARE: a link analytic system for graph labeling and risk detection. In: KDD, pp 1265–1274. ACM McGlohon M, Bay S, Anderle MG, Steier DM, Faloutsos C (2009) SNARE: a link analytic system for graph labeling and risk detection. In: KDD, pp 1265–1274. ACM
go back to reference McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444CrossRef McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444CrossRef
go back to reference Meyer CD (2000) Matrix analysis and applied linear algebra. SIAM, PhiladelphiaCrossRef Meyer CD (2000) Matrix analysis and applied linear algebra. SIAM, PhiladelphiaCrossRef
go back to reference Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Disc 12(2–3):203–228MathSciNetCrossRef Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Disc 12(2–3):203–228MathSciNetCrossRef
go back to reference Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. In: WWW conference, pp 161–172 Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. In: WWW conference, pp 161–172
go back to reference Pang G, Ting KM, Albrecht D (2015) LeSiNN: detecting anomalies by identifying least similar nearest neighbours. In: ICDM workshop, pp 623–630. IEEE Pang G, Ting KM, Albrecht D (2015) LeSiNN: detecting anomalies by identifying least similar nearest neighbours. In: ICDM workshop, pp 623–630. IEEE
go back to reference Pang G, Cao L, Chen L (2016) Outlier detection in complex categorical data by modelling the feature value couplings. In IJCAI, pp 1902–1908 Pang G, Cao L, Chen L (2016) Outlier detection in complex categorical data by modelling the feature value couplings. In IJCAI, pp 1902–1908
go back to reference Pang G, Cao L, Chen L, Lian D, Liu H (2018) Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data. In: Thirty-second AAAI conference on artificial intelligence Pang G, Cao L, Chen L, Lian D, Liu H (2018) Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data. In: Thirty-second AAAI conference on artificial intelligence
go back to reference Rayana S, Akoglu L (2016) Less is more: building selective anomaly ensembles. ACM Trans Knowl Discov Data 10(4):42CrossRef Rayana S, Akoglu L (2016) Less is more: building selective anomaly ensembles. ACM Trans Knowl Discov Data 10(4):42CrossRef
go back to reference Rayana S, Zhong W, Akoglu L (2016) Sequential ensemble learning for outlier detection: a bias-variance perspective. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1167–1172. IEEE Rayana S, Zhong W, Akoglu L (2016) Sequential ensemble learning for outlier detection: a bias-variance perspective. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1167–1172. IEEE
go back to reference Schubert E, Wojdanowski R, Zimek A, Kriegel H-P (2012) On evaluation of outlier rankings and outlier scores. In: Proceedings of the 2012 SIAM international conference on data mining, pp 1047–1058. SIAM Schubert E, Wojdanowski R, Zimek A, Kriegel H-P (2012) On evaluation of outlier rankings and outlier scores. In: Proceedings of the 2012 SIAM international conference on data mining, pp 1047–1058. SIAM
go back to reference Smets K, Vreeken J (2011) The odd one out: identifying and characterising anomalies. In: SDM, pp 109–148. SIAM Smets K, Vreeken J (2011) The odd one out: identifying and characterising anomalies. In: SDM, pp 109–148. SIAM
go back to reference Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetCrossRef Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetCrossRef
go back to reference Sugiyama M, Borgwardt K (2013) Rapid distance-based outlier detection via sampling. In: NIPS, pp 467–475 Sugiyama M, Borgwardt K (2013) Rapid distance-based outlier detection via sampling. In: NIPS, pp 467–475
go back to reference Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synth Lect Data Min Knowl Discov 3(2):1–159CrossRef Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synth Lect Data Min Knowl Discov 3(2):1–159CrossRef
go back to reference Tang G, Pei J, Bailey J, Dong G (2015) Mining multidimensional contextual outliers from categorical relational data. Intell Data Anal 19(5):1171–1192CrossRef Tang G, Pei J, Bailey J, Dong G (2015) Mining multidimensional contextual outliers from categorical relational data. Intell Data Anal 19(5):1171–1192CrossRef
go back to reference Tang J, Gao H, Hu X, Liu H (2013) Exploiting homophily effect for trust prediction. In: WSDM, pp 53–62. ACM Tang J, Gao H, Hu X, Liu H (2013) Exploiting homophily effect for trust prediction. In: WSDM, pp 53–62. ACM
go back to reference Ting KM, Washio T, Wells JR, Aryal S (2017) Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach Learn 106(1):55–91MathSciNetMATHCrossRef Ting KM, Washio T, Wells JR, Aryal S (2017) Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach Learn 106(1):55–91MathSciNetMATHCrossRef
go back to reference Wong W-K, Moore A, Cooper G, Wagner M (2003) Bayesian network anomaly pattern detection for disease outbreaks. In: ICML, pp 808–815 Wong W-K, Moore A, Cooper G, Wagner M (2003) Bayesian network anomaly pattern detection for disease outbreaks. In: ICML, pp 808–815
go back to reference Shu W, Wang S (2013) Information-theoretic outlier detection for large-scale categorical data. IEEE Trans Knowl Data Eng 25(3):589–602CrossRef Shu W, Wang S (2013) Information-theoretic outlier detection for large-scale categorical data. IEEE Trans Knowl Data Eng 25(3):589–602CrossRef
go back to reference Zhang Q, Cao L, Zhu C, Li Z, Sun J (2018) Coupledcf: learning explicit and implicit user-item couplings in recommendation for deep collaborative filtering. In: IJCAI’2018, pp 3662–3668 Zhang Q, Cao L, Zhu C, Li Z, Sun J (2018) Coupledcf: learning explicit and implicit user-item couplings in recommendation for deep collaborative filtering. In: IJCAI’2018, pp 3662–3668
go back to reference Zheng G, Brantley SL, Lauvaux T, Li Z (2017) Contextual spatial outlier detection with metric learning. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 2161–2170. ACM Zheng G, Brantley SL, Lauvaux T, Li Z (2017) Contextual spatial outlier detection with metric learning. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 2161–2170. ACM
go back to reference Zhou Z-H, Sun Y-Y, Li Y-F (2009) Multi-instance learning by treating instances as non-iid samples. In: ICML, pp 1249–1256. ACM Zhou Z-H, Sun Y-Y, Li Y-F (2009) Multi-instance learning by treating instances as non-iid samples. In: ICML, pp 1249–1256. ACM
go back to reference Zimek A, Campello RJGB, Sander J (2013) Ensembles for unsupervised outlier detection: challenges and research questions. ACM SIGKDD Explor Newsl 15(1):11–22CrossRef Zimek A, Campello RJGB, Sander J (2013) Ensembles for unsupervised outlier detection: challenges and research questions. ACM SIGKDD Explor Newsl 15(1):11–22CrossRef
Metadata
Title
Homophily outlier detection in non-IID categorical data
Authors
Guansong Pang
Longbing Cao
Ling Chen
Publication date
01-04-2021
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 4/2021
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-021-00750-y

Other articles of this Issue 4/2021

Data Mining and Knowledge Discovery 4/2021 Go to the issue

Premium Partner