Top

Data Mining and Knowledge Discovery

Published in:

02-08-2018

Explaining anomalies in groups with characterizing subspace rules

Authors: Meghanath Macha, Leman Akoglu

Published in: Data Mining and Knowledge Discovery | Issue 5/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Anomaly detection has numerous applications and has been studied vastly. We consider a complementary problem that has a much sparser literature: anomaly description. Interpretation of anomalies is crucial for practitioners for sense-making, troubleshooting, and planning actions. To this end, we present a new approach called x-PACS (for eXplaining Patterns of Anomalies with Characterizing Subspaces), which “reverse-engineers” the known anomalies by identifying (1) the groups (or patterns) that they form, and (2) the characterizing subspace and feature rules that separate each anomalous pattern from normal instances. Explaining anomalies in groups not only saves analyst time and gives insight into various types of anomalies, but also draws attention to potentially critical, repeating anomalies. In developing x-PACS, we first construct a desiderata for the anomaly description problem. From a descriptive data mining perspective, our method exhibits five desired properties in our desiderata. Namely, it can unearth anomalous patterns (i) of multiple different types, (ii) hidden in arbitrary subspaces of a high dimensional space, (iii) interpretable by human analysts, (iv) different from normal patterns of the data, and finally (v) succinct, providing a short data description. No existing work on anomaly description satisfies all of these properties simultaneously. Furthermore, x-PACS is highly parallelizable; it is linear on the number of data points and exponential on the (typically small) largest characterizing subspace size. The anomalous patterns that x-PACS finds constitute interpretable “signatures”, and while it is not our primary goal, they can be used for anomaly detection. Through extensive experiments on real-world datasets, we show the effectiveness and superiority of x-PACS in anomaly explanation over various baselines, and demonstrate its competitive detection performance as compared to the state-of-the-art.

previous article Social centrality using network hierarchy and community structure

next article ParCorr: efficient parallel methods to identify similar time series pairs across sliding windows

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

In this text, phrases ‘anomalous pattern’, ‘clustered anomalies’, and ‘group of anomalies’ are interchangeable.

X- refers to the number of packs, which we automatically identify via our data encoding scheme (Sect. 3.3). We use this naming convention after X-means, Pelleg and Moore (2000), which finds the number of k-means clusters automatically in an information-theoretic way.

KDE involves two parameters—the number of points sampled to construct the smooth curve and the kernel bandwidth. We set the sample size to 512 points and use the Silverman’s rule of thumb (Silverman 2018) to set the bandwidth.

For categorical features, we would instead use histogram density estimation.

We use \(\alpha =\{10^{-6}, 10^{-5},\ldots , 1\} \times \lambda =\{10^{-3}, 10^{-2},\ldots , 10^3\}\).

If the anomalous patterns are to be used for detection, we estimate a full \(\mathbf {U}\) matrix (i.e., possibly rotated ellipsoid).

Value of f is chosen according to the required floating point precision in the normalized feature space \(\mathbb {R}^{d}\).

Cost of encoding an arbitrary integer K is \(L_{\mathbb {N}}(K) = \log ^\star (K) + \log _2(c)\), where \(c \approx 2.865064\) and \(\log ^\star (K) = \log _2(K) + \log _2(\log _2(K)) + \ldots \) summing only the positive terms (Rissanen 1978). We drop \(\log _2(c)\) as it is constant for all packings.

Another way to identify the normal points in a pack: sort points by their distance to center and send the index of normal points in this list of length \(m_k\). This costs more for \(n_k\ge 2\): \(n_k\log _2 m_k> \log _2 \frac{m_k^{n_k}}{n_k!} > \log _2 \left( {\begin{array}{c}m_k\\ n_k\end{array}}\right) \).

Intuitively, this is where \(R_{\ell }\) drops when we add a new pack to \(\mathcal {S}\) (with positive cost) that does not cover any new anomalies.

For instance, if we have t \(d_{\max }\)-dimensional hyper-rectangles, then the complexity would be \(O(t2^{d_{\max }} + md_{\max })\), we could rewrite this as \(O(c^{d_{\max }} + md_{\max })\).

In practice, the solver converges in 20–100 iterations.

R package pre : https://CRAN.R-project.org/package=pre.

Note that, like any supervised method, x-PACS could only detect future instances of anomalies of known types.

SVDD optimization diverged for some high dimensional datasets, therefore, we performed PCA as a preprocessing step.

Aggarwal CC (2015) Outlier analysis. Springer, Cham, pp 237–263

Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, SIGMOD ’99. ACM, New York, NY, USA, pp 61–72

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98. ACM, New York, NY, USA, pp 94–105

Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst (TODS) 34(1):7:1–7:62CrossRef

Angiulli F, Fassetti F, Palopoli L (2013) Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans Knowl Data Eng 25(6):1280–1292CrossRef

Buchbinder N, Feldman M, Naor JS, Schwartz R (2014) Submodular maximization with cardinality constraints. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA ’14. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1433–1452

Cheng C-H, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99. ACM, New York, NY, USA, pp 84–93

Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283

Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine learning proceedings. Morgan Kaufmann, San Francisco, pp 115–123

Dang XH, Assent I, Ng RT, Zimek A, Schubert E (2014) Discriminative features for identifying and interpreting outliers. In: 2014 IEEE 30th international conference on data engineering, pp 88–99

Dang XH, Micenková B, Assent I, Ng RT (2013) Local outlier detection with interpretation. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, Heidelberg, pp 304–320CrossRef

Dave V, Guha S, Zhang Y (2012) Measuring and fingerprinting click-spam in ad networks. In: Proceedings of the ACM SIGCOMM 2012 conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM ’12. ACM, New York, NY, USA, pp 175–186

Deng H (2014) Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456

Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: 2017 IEEE international conference on computer vision (ICCV), pp 3449–3457

Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3):916–954MathSciNetCrossRefMATH

Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Int Res 17(1):501–527MATH

Gharan SO, Vondrák J (2011) Submodular maximization by simulated annealing. In: Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms, SODA ’11. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1098–1116

Görnitz N, Kloft M, Brefeld U (2009) Active and semi-supervised data domain description. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J (eds) Machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, pp 407–422CrossRef

Gnnemann S, Seidl T, Krieger R, Mller E, Assent I (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: 2009 Ninth IEEE international conference on data mining (ICDM), pp 377–386

Hara S, Hayashi K (2016) Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390

He J, Carbonell J (2010) Co-selection of features and instances for unsupervised rare category analysis. In: Proceedings of the 10th SIAM international conference on data mining, SDM 2010, pp 525–536

He J, Tong H, Carbonell J (2010) Rare category characterization. In: 2010 IEEE international conference on data mining, pp 226–235

Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525CrossRef

Keller F, Muller E, Bohm K (2012) Hics: High contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp 1037–1048

Keller F, Müller E, Wixler A, Böhm K (2013). Flexible and adaptive subspace search for outlier analysis. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM ’13. ACM, New York, NY, USA, pp 1381–1390

Klösgen W (1996) Explora: A multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, USA, pp 249–271

Klösgen W, May M (2002) Census data miningan application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD), Helsinki, Finland

Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 211–222

Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, vol 70 of Proceedings of machine learning research. International Convention Centre, Sydney, Australia, pp 1885–1894

Kopp M, Pevnỳ T, Holeňa M (2014) Interpreting and clustering outliers with sapling random forests. In: ITAT 2014. European conference on information technologies—applications and theory. Institute of Computer Science AS CR, pp 61–67

Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining (ICDM’05), p 8

Kriegel H-P, Kröger P, Schubert E, Zimek A (2009a) Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 831–838CrossRef

Kriegel H-P, Kröger P, Zimek A (2009b) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58CrossRef

Kriegel HP, Krger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th international conference on data mining, pp 379–388

Kuo C-T, Davidson I (2016) A framework for outlier description using constraint programming. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, pp 1237–1243

Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable and explorable approximations of black box models. CoRR, abs/1707.01154

Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05. ACM, New York, NY, USA, pp 157–166

Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on twitter. In: AAAI international conference on weblogs and social media (ICWSM). Citeseer

Loekito E, Bailey J (2008) Mining influential attributes that capture class and group contrast behaviour. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM ’08. ACM, New York, NY, USA, pp 971–980

Micenkov B, Ng RT, Dang XH, Assent I (2013) Explaining outliers by subspace separability. In: 2013 IEEE 13th international conference on data mining, pp 518–527

Moise G, Sander J, Ester M (2006) P3c: A robust projected clustering algorithm. In: Sixth international conference on data mining (ICDM’06), pp 414–425

Montavon G, Samek W, Müller K (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process Rev J 73:1–15MathSciNetCrossRef

Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing? In: 7th International AAAI conference on weblogs and social media, ICWSM 2013. AAAI Press

Muller E, Assent I, Steinhausen U, Seidl T (2008) Outrank: ranking outliers in high dimensional data. In: 2008 IEEE 24th international conference on data engineering workshop, pp 600–603

Mller E, Assent I, Iglesias P, Mlle Y, Bhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: 2012 IEEE 12th international conference on data mining, pp 529–538

Mller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: 2011 IEEE 27th international conference on data engineering, pp 434–445

Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105CrossRef

Pelleg D, Moore AW (2000) \({X}\)-means: Extending \({K}\)-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 727–734

Pevnỳ T, Kopp M (2014). Explaining anomalies with sapling random forests. In: Information technologies—applications and theory workshops, posters, and tutorials (ITAT 2014)

Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144

Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471CrossRefMATH

Sequeira K, Zaki M (2004). Schism: a new approach for interesting subspace mining. In: Data mining, 2004. ICDM ’04. Fourth IEEE international conference on data mining, pp 186–193

Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Abingdon

Tax DM, Duin RP (2004) Support vector data description. Mach Learn 54(1):45–66CrossRefMATH

Ting KM, Liu FT, Zhou Z (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining (ICDM), pp 413–422

Vreeken J, van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Disc 23(1):169–214MathSciNetCrossRefMATH

Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 78–87CrossRef

Zhang H, Diao Y, Meliou A (2017) Exstream: Explaining anomalies in event stream monitoring. In: Proceedings of the 20th international conference on extending database technology (EDBT), pp 156–167

Title: Explaining anomalies in groups with characterizing subspace rules
Authors: Meghanath Macha
Leman Akoglu
Publication date: 02-08-2018
Publisher: Springer US
Published in: Data Mining and Knowledge Discovery / Issue 5/2018
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-018-0585-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 5/2018

Temporal stability in predictive process monitoring

Analyzing concept drift and shift from sample data

Exploring variable-length time series motifs in one hundred million length scale

ParCorr: efficient parallel methods to identify similar time series pairs across sliding windows

Using core-periphery structure to predict high centrality nodes in time-varying networks

Dynamic graph summarization: a tensor decomposition approach

Premium Partner