nach oben

Knowledge and Information Systems

Erschienen in:

01.08.2014 | Regular Paper

GAMer: a synthesis of subspace clustering and dense subgraph mining

verfasst von: Stephan Günnemann, Ines Färber, Brigitte Boden, Thomas Seidl

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this work, we propose a new method to find homogeneous object groups in a single vertex-labeled graph. The basic premise is that many prevalent datasets consist of multiple types of information: graph data to represent the relations between objects and attribute data to characterize the single objects. Analyzing both information types simultaneously can increase the expressiveness of the resulting patterns. Our patterns of interest are sets of objects that are densely connected within the associated graph and as well show high similarity regarding their attributes. As for attribute data it is known that full-space clustering often is futile, we have to analyze the similarity of objects regarding subsets of their attributes. In order to take full advantage of all present information, we combine the paradigms of dense subgraph mining and subspace clustering. For our approach, we face several challenges to achieve a sound combination of the two paradigms. We maximize our twofold clusters according to their density, size, and number of relevant dimensions. The optimization of these three objectives usually is conflicting; thus, we realize a trade-off between these characteristics to obtain meaningful patterns. We develop a redundancy model to confine the clustering to a manageable size by selecting only the most interesting clusters for the result set. We prove the complexity of our clustering model and we particularly focus on the exploration of various pruning strategies to design the efficient algorithm GAMer (Graph & Attribute Miner). In thorough experiments on synthetic and real world data we show that GAMer achieves low runtimes and high clustering qualities. We provide all datasets, measures, executables, and parameter settings on our website http://dme.rwth-aachen.de/gamer.

Nächster Artikel Compressed representations for web and social graphs

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.nber.org/patents/.

http://www.cs.cornell.edu/projects/kddcup/datasets.html.

http://dblp.uni-trier.de.

Abello J, Resende M, Sudarsky S et al. (2002) Massive quasi-clique detection. Lecture Notes in Computer Science pp. 598–612

Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, New YorkCrossRefMATH

Aggarwal C, Wolf J, Yu P, Procopiuc C, Park J (1999) Fast algorithms for projected clustering. In: SIGMOD, pp 61–72

Al Hasan M, Chaoji V, Salem S, Besson J, Zaki M (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162

Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful?. In: ICDT, pp 217–235

Condon A, Karp RM (2001) Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms 18(2):116–140CrossRefMATHMathSciNet

Ding CHQ, He X, Zha H, Gu M, Simon HD (2001) A min-max cut algorithm for graph partitioning and data clustering. In: ICDM, pp 107–114

Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: WebKDD/SNA-KDD, pp 16–25

Ester M, Ge R, Gao BJ, Hu Z, Ben-Moshe B (2006) Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: SDM

10.

Garey M, Johnson D (1979) Computers and intractability: a guide to NP-completeness. W.H Freeman and Company, San FranciscoMATH

11.

Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: ICDM, pp 845–850

12.

Günnemann S, Färber I, Müller E, Assent I, Seidl T (2011) External evaluation measures for subspace clustering. In: CIKM, pp 1363–1372

13.

Günnemann S, Kremer H, Seidl T (2010) Subspace clustering for uncertain data. In: SDM, pp 385–396

14.

Günnemann S, Müller E, Färber I, Seidl T (2009) Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp 1317–1326

15.

Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco

16.

Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:145–154CrossRef

17.

Jolliffe I (2002) Principal component analysis, 2nd edn. Springer, New YorkMATH

18.

Kailing K, Kriegel HP, Kroeger P (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257

19.

Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1):1–58CrossRef

20.

Kubica J, Moore AW, Schneider JG (2003) Tractable group detection on large link data sets. In: ICDM, pp 573–576

21.

Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: ECML/PKDD (2). pp 33–49

22.

Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: KDD, pp 317–326

23.

Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: KDD, pp 470–479

24.

Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541

25.

Moise G, Sander J, Ester M (2006) P3C: a robust projected clustering algorithm. In: ICDM, pp 414–425

26.

Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604

27.

Müller E, Assent I, Günnemann S, Krieger R, Seidl T (2009) Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp 377–386

28.

Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281

29.

Neville J, Adler M, Jensen D (2004) Spectral clustering with links and attributes. Dept of Computer Science, University of Massachusetts Amherst, Tech. rep

30.

Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor 6(1):90–105CrossRef

31.

Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: KDD, pp 228–238

32.

Procopiuc CM, Jones M, Agarwal PK, Murali TM (2002) A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp 418–427

33.

Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM, pp 643–648

34.

Rymon R (1992) Search through systematic set enumeration. In: K.R., pp 539–550

35.

Sequeira K, Zaki MJ (2004) Schism: a new approach for interesting subspace mining. In: ICDM, pp 186–193

36.

Shiga M, Takigawa I, Mamitsuka H (2007) A spectral clustering approach to optimally combining numerical vectors with a modular network. In: SIGKDD, pp 647–656

37.

Shyamsundar R, et al. (2005) A DNA microarray survey of gene expression in normal human tissues. Genome Biol 6(3):R22

38.

Silva A, Meira W Jr, Zaki M (2010) Structural correlation pattern mining for large graphs. In: Workshop on mining and learning with graphs, pp 119–126

39.

Stark C, et al. (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34(suppl 1):D535–D539

40.

Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1(1):8

41.

Wang J, Zeng Z, Zhou L (2006) Clan: an algorithm for mining closed cliques from large dense graph databases. In: ICDE, p 73

42.

Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. In: ICDM, pp 689–692

43.

Yiu ML, Mamoulis N (2005) Iterative projected clustering by subspace mining. IEEE Trans Knowl Data Eng (TKDE) 17(2):176–189CrossRef

44.

Zeeberg B, et al (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4(4):R28

45.

Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: KDD, pp 797–802

46.

Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. TODS 32(2):13

47.

Zhang S, Yang J, Li S (2009) RING: an integrated method for frequent representative subgraph mining. In: ICDM, pp 1082–1087

48.

Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. In: VLDB, pp 718–729

49.

Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: ICDM, pp 689–698

Titel: GAMer: a synthesis of subspace clustering and dense subgraph mining
verfasst von: Stephan Günnemann
Ines Färber
Brigitte Boden
Thomas Seidl
Publikationsdatum: 01.08.2014
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 2/2014
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-013-0640-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 2/2014

Exploring feature sets for two-phase biomedical named entity recognition using semi-CRFs

Compressed representations for web and social graphs

A theoretical model for the automatic generation of tag clouds

Dual modularity optimization for detecting overlapping communities in bipartite networks

QoS-aware web service selection with negative selection algorithm

Computing and visualizing popular places