nach oben

Knowledge and Information Systems

Erschienen in:

01.04.2016 | Regular Paper

Efficient discovery of contrast subspaces for object explanation and characterization

verfasst von: Lei Duan, Guanting Tang, Jian Pei, James Bailey, Guozhu Dong, Vinh Nguyen, Akiko Campbell, Changjie Tang

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We tackle the novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes \(C_+\) and \(C_-\) and a query object \(o\), we want to find the top-\(k\) subspaces that maximize the ratio of likelihood of \(o\) in \(C_+\) against that in \(C_-\). Such subspaces are very useful for characterizing an object and explaining how it differs between two classes. We demonstrate that this problem has important applications, and, at the same time, is very challenging, being MAX SNP-hard. We present CSMiner, a mining method that uses kernel density estimation in conjunction with various pruning techniques. We experimentally investigate the performance of CSMiner on a range of data sets, evaluating its efficiency, effectiveness, and stability and demonstrating it is substantially faster than a baseline method.

Vorheriger Artikel Soft-constrained Laplacian score for semi-supervised multi-label feature selection

Nächster Artikel Speeding up ALS learning via approximate methods for context-aware recommendations

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

While [8] presented a contrast-pattern length based algorithm to detection global outliers, their problem setting is different from ours.

Generally, given a set of observations \(Q\), the plausibility of two models \(M_1\) and \(M_2\) can be assessed by the Bayes factor \(K=\frac{Pr(Q\mid M_1)}{Pr(Q \mid M_2)}\).

If it is not unimodal, then there could be multiple peaks at different distances from the query, which is counter to intuition. Similarly, we have no basis for preferring any direction over another, so symmetry is natural.

Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. ACM Sigmod Rec 30:37–46CrossRef

Bache K, Lichman M (2013) UCI machine learning repository

Bay SD, Pazzani MJ (2001) Detecting group differences: mining contrast sets. Data Min Knowl Discov 5(3):213–246CrossRefMATH

Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proc. of the 7th Int’l Conf on Database Theory, pp 217–235

Böhm K, Keller F, Müller E, Nguyen HV, Vreeken J (2013) CMI: An information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: Proc. of the 13th SIAM Int’l Conf on Data Min, pp 198–206

Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: Identifying density-based local outliers. In: Proc. of the 2000 ACM SIGMOD Int’l Conf on Manag of data, pp 93–104

Cai Y, Zhao HK, Han H, Lau RYK, Leung HF, Min H (2012) Answering typicality query based on automatically prototype construction. In: Proc. of the 2012 IEEE/WIC/ACM Int’l Joint Conf Web Intell Intell Agent Technol, 01:362–366

Chen L, Dong G (2006) Masquerader detection using OCLEP: one class classification using length statistics of emerging patterns. In: Proc. of Int’l workshop on information Processing over Evolving Networks (WINPEN), p 5

Dong G, Bailey J (eds) (2013) Contrast data mining: concepts, algorithms, and applications. CRC Press, Boca Raton

10.

Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proc. of the 5th ACM SIGKDD Int’l Conf on Knowledge Discovery and Data Mining, pp 43–52

11.

Duan L, Tang G, Pei J, Bailey J, Dong G, Campbell A, Tang C (2014) Mining contrast subspaces. In: Proc. of the 18th Pacific-Asia Conf on Knowledge Discovery and Data Mining, pp 249–260

12.

Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: Proc. of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 28–36

13.

He Z, Xu X, Huang ZJ, Deng S (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118CrossRef

14.

Hua M, Pei J, Fu AW, Lin X, Leung HF (2009) Top-k typicality queries and efficient query answering methods on large databases. VLDB J 18(3):809–835CrossRef

15.

Jeffreys H (1961) The theory of probability, 3rd edn. Oxford

16.

Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proc. of the IEEE 28th Int’l Conf on Data Engineering, pp 1037–1048

17.

Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proc. of the 14th ACM SIGKDD Int’l Conf on Knowledge Discovery and Data Mining, pp 444–452

18.

Kriegel HP, Kröger P, Schubert E, Zimek A (2009) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proc. of the 13th Pacific-Asia Conf on Knowledge Discovery and Data Mining, pp 831–838

19.

Novak PK, Lavrac N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403MATH

20.

Papadimitriou CH, Yannakakis M (1991) Optimization, approximation, and complexity classes. J Comput Syst Sci 43(3):425–440MathSciNetCrossRefMATH

21.

Rymon R (1992) Search through systematic set enumeration. In: Proc. of the 3rd Int’l Conf on Principles of Knowledge Representation and Reasoning, pp 539–550

22.

Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall/CRC, LondonCrossRefMATH

23.

Wang L, Zhao H, Dong G, Li J (2005) On the complexity of finding emerging patterns. Theor Comput Sci 335(1):15–27MathSciNetCrossRefMATH

24.

Webber W, Moffat A, Zobel J (2010) A similarity measure for indefinite rankings. ACM Trans Inf Syst 28(4):20:1–20:38CrossRef

25.

Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proc. of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery, pp 78–87

26.

Wu S, Crestani F (2003) Methods for ranking information retrieval systems without relevance judgments. In: Proc. of the 2003 ACM Symposium on Applied Computing. ACM, New York, NY, USA, pp 811–816

Titel: Efficient discovery of contrast subspaces for object explanation and characterization
verfasst von: Lei Duan
Guanting Tang
Jian Pei
James Bailey
Guozhu Dong
Vinh Nguyen
Akiko Campbell
Changjie Tang
Publikationsdatum: 01.04.2016
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 1/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-015-0835-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2016

PISAGOR: a proactive software agent for monitoring interactions

Soft-constrained Laplacian score for semi-supervised multi-label feature selection

A framework for anomaly detection in maritime trajectory behavior

Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Cross-lingual sentiment classification with stacked autoencoders

Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm

Premium Partner