Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 5/2018

02-08-2018

Explaining anomalies in groups with characterizing subspace rules

Authors: Meghanath Macha, Leman Akoglu

Published in: Data Mining and Knowledge Discovery | Issue 5/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Anomaly detection has numerous applications and has been studied vastly. We consider a complementary problem that has a much sparser literature: anomaly description. Interpretation of anomalies is crucial for practitioners for sense-making, troubleshooting, and planning actions. To this end, we present a new approach called x-PACS (for eXplaining Patterns of Anomalies with Characterizing Subspaces), which “reverse-engineers” the known anomalies by identifying (1) the groups (or patterns) that they form, and (2) the characterizing subspace and feature rules that separate each anomalous pattern from normal instances. Explaining anomalies in groups not only saves analyst time and gives insight into various types of anomalies, but also draws attention to potentially critical, repeating anomalies. In developing x-PACS, we first construct a desiderata for the anomaly description problem. From a descriptive data mining perspective, our method exhibits five desired properties in our desiderata. Namely, it can unearth anomalous patterns (i) of multiple different types, (ii) hidden in arbitrary subspaces of a high dimensional space, (iii) interpretable by human analysts, (iv) different from normal patterns of the data, and finally (v) succinct, providing a short data description. No existing work on anomaly description satisfies all of these properties simultaneously. Furthermore, x-PACS is highly parallelizable; it is linear on the number of data points and exponential on the (typically small) largest characterizing subspace size. The anomalous patterns that x-PACS finds constitute interpretable “signatures”, and while it is not our primary goal, they can be used for anomaly detection. Through extensive experiments on real-world datasets, we show the effectiveness and superiority of x-PACS in anomaly explanation over various baselines, and demonstrate its competitive detection performance as compared to the state-of-the-art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In this text, phrases ‘anomalous pattern’, ‘clustered anomalies’, and ‘group of anomalies’ are interchangeable.
 
2
X- refers to the number of packs, which we automatically identify via our data encoding scheme (Sect. 3.3). We use this naming convention after X-means, Pelleg and Moore (2000), which finds the number of k-means clusters automatically in an information-theoretic way.
 
3
KDE involves two parameters—the number of points sampled to construct the smooth curve and the kernel bandwidth. We set the sample size to 512 points and use the Silverman’s rule of thumb (Silverman 2018) to set the bandwidth.
 
4
For categorical features, we would instead use histogram density estimation.
 
5
We use \(\alpha =\{10^{-6}, 10^{-5},\ldots , 1\} \times \lambda =\{10^{-3}, 10^{-2},\ldots , 10^3\}\).
 
6
If the anomalous patterns are to be used for detection, we estimate a full \(\mathbf {U}\) matrix (i.e., possibly rotated ellipsoid).
 
7
Value of f is chosen according to the required floating point precision in the normalized feature space \(\mathbb {R}^{d}\).
 
8
Cost of encoding an arbitrary integer K is \(L_{\mathbb {N}}(K) = \log ^\star (K) + \log _2(c)\), where \(c \approx 2.865064\) and \(\log ^\star (K) = \log _2(K) + \log _2(\log _2(K)) + \ldots \) summing only the positive terms (Rissanen 1978). We drop \(\log _2(c)\) as it is constant for all packings.
 
9
Another way to identify the normal points in a pack: sort points by their distance to center and send the index of normal points in this list of length \(m_k\). This costs more for \(n_k\ge 2\): \(n_k\log _2 m_k> \log _2 \frac{m_k^{n_k}}{n_k!} > \log _2 \left( {\begin{array}{c}m_k\\ n_k\end{array}}\right) \).
 
10
Intuitively, this is where \(R_{\ell }\) drops when we add a new pack to \(\mathcal {S}\) (with positive cost) that does not cover any new anomalies.
 
11
For instance, if we have t \(d_{\max }\)-dimensional hyper-rectangles, then the complexity would be \(O(t2^{d_{\max }} + md_{\max })\), we could rewrite this as \(O(c^{d_{\max }} + md_{\max })\).
 
12
In practice, the solver converges in 20–100 iterations.
 
14
Note that, like any supervised method, x-PACS could only detect future instances of anomalies of known types.
 
15
SVDD optimization diverged for some high dimensional datasets, therefore, we performed PCA as a preprocessing step.
 
Literature
go back to reference Aggarwal CC (2015) Outlier analysis. Springer, Cham, pp 237–263 Aggarwal CC (2015) Outlier analysis. Springer, Cham, pp 237–263
go back to reference Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, SIGMOD ’99. ACM, New York, NY, USA, pp 61–72 Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, SIGMOD ’99. ACM, New York, NY, USA, pp 61–72
go back to reference Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98. ACM, New York, NY, USA, pp 94–105 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98. ACM, New York, NY, USA, pp 94–105
go back to reference Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst (TODS) 34(1):7:1–7:62CrossRef Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst (TODS) 34(1):7:1–7:62CrossRef
go back to reference Angiulli F, Fassetti F, Palopoli L (2013) Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans Knowl Data Eng 25(6):1280–1292CrossRef Angiulli F, Fassetti F, Palopoli L (2013) Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans Knowl Data Eng 25(6):1280–1292CrossRef
go back to reference Buchbinder N, Feldman M, Naor JS, Schwartz R (2014) Submodular maximization with cardinality constraints. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA ’14. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1433–1452 Buchbinder N, Feldman M, Naor JS, Schwartz R (2014) Submodular maximization with cardinality constraints. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA ’14. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1433–1452
go back to reference Cheng C-H, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99. ACM, New York, NY, USA, pp 84–93 Cheng C-H, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99. ACM, New York, NY, USA, pp 84–93
go back to reference Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283 Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283
go back to reference Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine learning proceedings. Morgan Kaufmann, San Francisco, pp 115–123 Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine learning proceedings. Morgan Kaufmann, San Francisco, pp 115–123
go back to reference Dang XH, Assent I, Ng RT, Zimek A, Schubert E (2014) Discriminative features for identifying and interpreting outliers. In: 2014 IEEE 30th international conference on data engineering, pp 88–99 Dang XH, Assent I, Ng RT, Zimek A, Schubert E (2014) Discriminative features for identifying and interpreting outliers. In: 2014 IEEE 30th international conference on data engineering, pp 88–99
go back to reference Dang XH, Micenková B, Assent I, Ng RT (2013) Local outlier detection with interpretation. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, Heidelberg, pp 304–320CrossRef Dang XH, Micenková B, Assent I, Ng RT (2013) Local outlier detection with interpretation. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, Heidelberg, pp 304–320CrossRef
go back to reference Dave V, Guha S, Zhang Y (2012) Measuring and fingerprinting click-spam in ad networks. In: Proceedings of the ACM SIGCOMM 2012 conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM ’12. ACM, New York, NY, USA, pp 175–186 Dave V, Guha S, Zhang Y (2012) Measuring and fingerprinting click-spam in ad networks. In: Proceedings of the ACM SIGCOMM 2012 conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM ’12. ACM, New York, NY, USA, pp 175–186
go back to reference Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: 2017 IEEE international conference on computer vision (ICCV), pp 3449–3457 Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: 2017 IEEE international conference on computer vision (ICCV), pp 3449–3457
go back to reference Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Int Res 17(1):501–527MATH Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Int Res 17(1):501–527MATH
go back to reference Gharan SO, Vondrák J (2011) Submodular maximization by simulated annealing. In: Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms, SODA ’11. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1098–1116 Gharan SO, Vondrák J (2011) Submodular maximization by simulated annealing. In: Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms, SODA ’11. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1098–1116
go back to reference Görnitz N, Kloft M, Brefeld U (2009) Active and semi-supervised data domain description. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J (eds) Machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, pp 407–422CrossRef Görnitz N, Kloft M, Brefeld U (2009) Active and semi-supervised data domain description. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J (eds) Machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, pp 407–422CrossRef
go back to reference Gnnemann S, Seidl T, Krieger R, Mller E, Assent I (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: 2009 Ninth IEEE international conference on data mining (ICDM), pp 377–386 Gnnemann S, Seidl T, Krieger R, Mller E, Assent I (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: 2009 Ninth IEEE international conference on data mining (ICDM), pp 377–386
go back to reference He J, Carbonell J (2010) Co-selection of features and instances for unsupervised rare category analysis. In: Proceedings of the 10th SIAM international conference on data mining, SDM 2010, pp 525–536 He J, Carbonell J (2010) Co-selection of features and instances for unsupervised rare category analysis. In: Proceedings of the 10th SIAM international conference on data mining, SDM 2010, pp 525–536
go back to reference He J, Tong H, Carbonell J (2010) Rare category characterization. In: 2010 IEEE international conference on data mining, pp 226–235 He J, Tong H, Carbonell J (2010) Rare category characterization. In: 2010 IEEE international conference on data mining, pp 226–235
go back to reference Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525CrossRef Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525CrossRef
go back to reference Keller F, Muller E, Bohm K (2012) Hics: High contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp 1037–1048 Keller F, Muller E, Bohm K (2012) Hics: High contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp 1037–1048
go back to reference Keller F, Müller E, Wixler A, Böhm K (2013). Flexible and adaptive subspace search for outlier analysis. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM ’13. ACM, New York, NY, USA, pp 1381–1390 Keller F, Müller E, Wixler A, Böhm K (2013). Flexible and adaptive subspace search for outlier analysis. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM ’13. ACM, New York, NY, USA, pp 1381–1390
go back to reference Klösgen W (1996) Explora: A multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, USA, pp 249–271 Klösgen W (1996) Explora: A multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, USA, pp 249–271
go back to reference Klösgen W, May M (2002) Census data miningan application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD), Helsinki, Finland Klösgen W, May M (2002) Census data miningan application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD), Helsinki, Finland
go back to reference Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 211–222 Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 211–222
go back to reference Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, vol 70 of Proceedings of machine learning research. International Convention Centre, Sydney, Australia, pp 1885–1894 Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, vol 70 of Proceedings of machine learning research. International Convention Centre, Sydney, Australia, pp 1885–1894
go back to reference Kopp M, Pevnỳ T, Holeňa M (2014) Interpreting and clustering outliers with sapling random forests. In: ITAT 2014. European conference on information technologies—applications and theory. Institute of Computer Science AS CR, pp 61–67 Kopp M, Pevnỳ T, Holeňa M (2014) Interpreting and clustering outliers with sapling random forests. In: ITAT 2014. European conference on information technologies—applications and theory. Institute of Computer Science AS CR, pp 61–67
go back to reference Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining (ICDM’05), p 8 Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining (ICDM’05), p 8
go back to reference Kriegel H-P, Kröger P, Schubert E, Zimek A (2009a) Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 831–838CrossRef Kriegel H-P, Kröger P, Schubert E, Zimek A (2009a) Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 831–838CrossRef
go back to reference Kriegel H-P, Kröger P, Zimek A (2009b) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58CrossRef Kriegel H-P, Kröger P, Zimek A (2009b) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58CrossRef
go back to reference Kriegel HP, Krger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th international conference on data mining, pp 379–388 Kriegel HP, Krger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th international conference on data mining, pp 379–388
go back to reference Kuo C-T, Davidson I (2016) A framework for outlier description using constraint programming. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, pp 1237–1243 Kuo C-T, Davidson I (2016) A framework for outlier description using constraint programming. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, pp 1237–1243
go back to reference Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable and explorable approximations of black box models. CoRR, abs/1707.01154 Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable and explorable approximations of black box models. CoRR, abs/1707.01154
go back to reference Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05. ACM, New York, NY, USA, pp 157–166 Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05. ACM, New York, NY, USA, pp 157–166
go back to reference Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on twitter. In: AAAI international conference on weblogs and social media (ICWSM). Citeseer Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on twitter. In: AAAI international conference on weblogs and social media (ICWSM). Citeseer
go back to reference Loekito E, Bailey J (2008) Mining influential attributes that capture class and group contrast behaviour. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM ’08. ACM, New York, NY, USA, pp 971–980 Loekito E, Bailey J (2008) Mining influential attributes that capture class and group contrast behaviour. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM ’08. ACM, New York, NY, USA, pp 971–980
go back to reference Micenkov B, Ng RT, Dang XH, Assent I (2013) Explaining outliers by subspace separability. In: 2013 IEEE 13th international conference on data mining, pp 518–527 Micenkov B, Ng RT, Dang XH, Assent I (2013) Explaining outliers by subspace separability. In: 2013 IEEE 13th international conference on data mining, pp 518–527
go back to reference Moise G, Sander J, Ester M (2006) P3c: A robust projected clustering algorithm. In: Sixth international conference on data mining (ICDM’06), pp 414–425 Moise G, Sander J, Ester M (2006) P3c: A robust projected clustering algorithm. In: Sixth international conference on data mining (ICDM’06), pp 414–425
go back to reference Montavon G, Samek W, Müller K (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process Rev J 73:1–15MathSciNetCrossRef Montavon G, Samek W, Müller K (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process Rev J 73:1–15MathSciNetCrossRef
go back to reference Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing? In: 7th International AAAI conference on weblogs and social media, ICWSM 2013. AAAI Press Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing? In: 7th International AAAI conference on weblogs and social media, ICWSM 2013. AAAI Press
go back to reference Muller E, Assent I, Steinhausen U, Seidl T (2008) Outrank: ranking outliers in high dimensional data. In: 2008 IEEE 24th international conference on data engineering workshop, pp 600–603 Muller E, Assent I, Steinhausen U, Seidl T (2008) Outrank: ranking outliers in high dimensional data. In: 2008 IEEE 24th international conference on data engineering workshop, pp 600–603
go back to reference Mller E, Assent I, Iglesias P, Mlle Y, Bhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: 2012 IEEE 12th international conference on data mining, pp 529–538 Mller E, Assent I, Iglesias P, Mlle Y, Bhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: 2012 IEEE 12th international conference on data mining, pp 529–538
go back to reference Mller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: 2011 IEEE 27th international conference on data engineering, pp 434–445 Mller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: 2011 IEEE 27th international conference on data engineering, pp 434–445
go back to reference Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105CrossRef Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105CrossRef
go back to reference Pelleg D, Moore AW (2000) \({X}\)-means: Extending \({K}\)-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 727–734 Pelleg D, Moore AW (2000) \({X}\)-means: Extending \({K}\)-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 727–734
go back to reference Pevnỳ T, Kopp M (2014). Explaining anomalies with sapling random forests. In: Information technologies—applications and theory workshops, posters, and tutorials (ITAT 2014) Pevnỳ T, Kopp M (2014). Explaining anomalies with sapling random forests. In: Information technologies—applications and theory workshops, posters, and tutorials (ITAT 2014)
go back to reference Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144 Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144
go back to reference Sequeira K, Zaki M (2004). Schism: a new approach for interesting subspace mining. In: Data mining, 2004. ICDM ’04. Fourth IEEE international conference on data mining, pp 186–193 Sequeira K, Zaki M (2004). Schism: a new approach for interesting subspace mining. In: Data mining, 2004. ICDM ’04. Fourth IEEE international conference on data mining, pp 186–193
go back to reference Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Abingdon Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Abingdon
go back to reference Ting KM, Liu FT, Zhou Z (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining (ICDM), pp 413–422 Ting KM, Liu FT, Zhou Z (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining (ICDM), pp 413–422
go back to reference Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 78–87CrossRef Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 78–87CrossRef
go back to reference Zhang H, Diao Y, Meliou A (2017) Exstream: Explaining anomalies in event stream monitoring. In: Proceedings of the 20th international conference on extending database technology (EDBT), pp 156–167 Zhang H, Diao Y, Meliou A (2017) Exstream: Explaining anomalies in event stream monitoring. In: Proceedings of the 20th international conference on extending database technology (EDBT), pp 156–167
Metadata
Title
Explaining anomalies in groups with characterizing subspace rules
Authors
Meghanath Macha
Leman Akoglu
Publication date
02-08-2018
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 5/2018
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-018-0585-7

Other articles of this Issue 5/2018

Data Mining and Knowledge Discovery 5/2018 Go to the issue

Premium Partner