Skip to main content
Top
Published in: Knowledge and Information Systems 2/2014

01-08-2014 | Regular Paper

GAMer: a synthesis of subspace clustering and dense subgraph mining

Authors: Stephan Günnemann, Ines Färber, Brigitte Boden, Thomas Seidl

Published in: Knowledge and Information Systems | Issue 2/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, we propose a new method to find homogeneous object groups in a single vertex-labeled graph. The basic premise is that many prevalent datasets consist of multiple types of information: graph data to represent the relations between objects and attribute data to characterize the single objects. Analyzing both information types simultaneously can increase the expressiveness of the resulting patterns. Our patterns of interest are sets of objects that are densely connected within the associated graph and as well show high similarity regarding their attributes. As for attribute data it is known that full-space clustering often is futile, we have to analyze the similarity of objects regarding subsets of their attributes. In order to take full advantage of all present information, we combine the paradigms of dense subgraph mining and subspace clustering. For our approach, we face several challenges to achieve a sound combination of the two paradigms. We maximize our twofold clusters according to their density, size, and number of relevant dimensions. The optimization of these three objectives usually is conflicting; thus, we realize a trade-off between these characteristics to obtain meaningful patterns. We develop a redundancy model to confine the clustering to a manageable size by selecting only the most interesting clusters for the result set. We prove the complexity of our clustering model and we particularly focus on the exploration of various pruning strategies to design the efficient algorithm GAMer (Graph & Attribute Miner). In thorough experiments on synthetic and real world data we show that GAMer achieves low runtimes and high clustering qualities. We provide all datasets, measures, executables, and parameter settings on our website http://​dme.​rwth-aachen.​de/​gamer.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abello J, Resende M, Sudarsky S et al. (2002) Massive quasi-clique detection. Lecture Notes in Computer Science pp. 598–612 Abello J, Resende M, Sudarsky S et al. (2002) Massive quasi-clique detection. Lecture Notes in Computer Science pp. 598–612
3.
go back to reference Aggarwal C, Wolf J, Yu P, Procopiuc C, Park J (1999) Fast algorithms for projected clustering. In: SIGMOD, pp 61–72 Aggarwal C, Wolf J, Yu P, Procopiuc C, Park J (1999) Fast algorithms for projected clustering. In: SIGMOD, pp 61–72
4.
go back to reference Al Hasan M, Chaoji V, Salem S, Besson J, Zaki M (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162 Al Hasan M, Chaoji V, Salem S, Besson J, Zaki M (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162
5.
go back to reference Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful?. In: ICDT, pp 217–235 Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful?. In: ICDT, pp 217–235
6.
go back to reference Condon A, Karp RM (2001) Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms 18(2):116–140CrossRefMATHMathSciNet Condon A, Karp RM (2001) Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms 18(2):116–140CrossRefMATHMathSciNet
7.
go back to reference Ding CHQ, He X, Zha H, Gu M, Simon HD (2001) A min-max cut algorithm for graph partitioning and data clustering. In: ICDM, pp 107–114 Ding CHQ, He X, Zha H, Gu M, Simon HD (2001) A min-max cut algorithm for graph partitioning and data clustering. In: ICDM, pp 107–114
8.
go back to reference Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: WebKDD/SNA-KDD, pp 16–25 Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: WebKDD/SNA-KDD, pp 16–25
9.
go back to reference Ester M, Ge R, Gao BJ, Hu Z, Ben-Moshe B (2006) Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: SDM Ester M, Ge R, Gao BJ, Hu Z, Ben-Moshe B (2006) Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: SDM
10.
go back to reference Garey M, Johnson D (1979) Computers and intractability: a guide to NP-completeness. W.H Freeman and Company, San FranciscoMATH Garey M, Johnson D (1979) Computers and intractability: a guide to NP-completeness. W.H Freeman and Company, San FranciscoMATH
11.
go back to reference Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: ICDM, pp 845–850 Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: ICDM, pp 845–850
12.
go back to reference Günnemann S, Färber I, Müller E, Assent I, Seidl T (2011) External evaluation measures for subspace clustering. In: CIKM, pp 1363–1372 Günnemann S, Färber I, Müller E, Assent I, Seidl T (2011) External evaluation measures for subspace clustering. In: CIKM, pp 1363–1372
13.
go back to reference Günnemann S, Kremer H, Seidl T (2010) Subspace clustering for uncertain data. In: SDM, pp 385–396 Günnemann S, Kremer H, Seidl T (2010) Subspace clustering for uncertain data. In: SDM, pp 385–396
14.
go back to reference Günnemann S, Müller E, Färber I, Seidl T (2009) Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp 1317–1326 Günnemann S, Müller E, Färber I, Seidl T (2009) Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp 1317–1326
15.
go back to reference Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
16.
go back to reference Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:145–154CrossRef Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:145–154CrossRef
17.
go back to reference Jolliffe I (2002) Principal component analysis, 2nd edn. Springer, New YorkMATH Jolliffe I (2002) Principal component analysis, 2nd edn. Springer, New YorkMATH
18.
go back to reference Kailing K, Kriegel HP, Kroeger P (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257 Kailing K, Kriegel HP, Kroeger P (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257
19.
go back to reference Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1):1–58CrossRef Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1):1–58CrossRef
20.
go back to reference Kubica J, Moore AW, Schneider JG (2003) Tractable group detection on large link data sets. In: ICDM, pp 573–576 Kubica J, Moore AW, Schneider JG (2003) Tractable group detection on large link data sets. In: ICDM, pp 573–576
21.
go back to reference Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: ECML/PKDD (2). pp 33–49 Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: ECML/PKDD (2). pp 33–49
22.
go back to reference Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: KDD, pp 317–326 Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: KDD, pp 317–326
23.
go back to reference Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: KDD, pp 470–479 Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: KDD, pp 470–479
24.
go back to reference Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541 Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541
25.
go back to reference Moise G, Sander J, Ester M (2006) P3C: a robust projected clustering algorithm. In: ICDM, pp 414–425 Moise G, Sander J, Ester M (2006) P3C: a robust projected clustering algorithm. In: ICDM, pp 414–425
26.
go back to reference Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604 Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604
27.
go back to reference Müller E, Assent I, Günnemann S, Krieger R, Seidl T (2009) Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp 377–386 Müller E, Assent I, Günnemann S, Krieger R, Seidl T (2009) Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp 377–386
28.
go back to reference Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281 Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281
29.
go back to reference Neville J, Adler M, Jensen D (2004) Spectral clustering with links and attributes. Dept of Computer Science, University of Massachusetts Amherst, Tech. rep Neville J, Adler M, Jensen D (2004) Spectral clustering with links and attributes. Dept of Computer Science, University of Massachusetts Amherst, Tech. rep
30.
go back to reference Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor 6(1):90–105CrossRef Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor 6(1):90–105CrossRef
31.
go back to reference Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: KDD, pp 228–238 Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: KDD, pp 228–238
32.
go back to reference Procopiuc CM, Jones M, Agarwal PK, Murali TM (2002) A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp 418–427 Procopiuc CM, Jones M, Agarwal PK, Murali TM (2002) A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp 418–427
33.
go back to reference Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM, pp 643–648 Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM, pp 643–648
34.
go back to reference Rymon R (1992) Search through systematic set enumeration. In: K.R., pp 539–550 Rymon R (1992) Search through systematic set enumeration. In: K.R., pp 539–550
35.
go back to reference Sequeira K, Zaki MJ (2004) Schism: a new approach for interesting subspace mining. In: ICDM, pp 186–193 Sequeira K, Zaki MJ (2004) Schism: a new approach for interesting subspace mining. In: ICDM, pp 186–193
36.
go back to reference Shiga M, Takigawa I, Mamitsuka H (2007) A spectral clustering approach to optimally combining numerical vectors with a modular network. In: SIGKDD, pp 647–656 Shiga M, Takigawa I, Mamitsuka H (2007) A spectral clustering approach to optimally combining numerical vectors with a modular network. In: SIGKDD, pp 647–656
37.
go back to reference Shyamsundar R, et al. (2005) A DNA microarray survey of gene expression in normal human tissues. Genome Biol 6(3):R22 Shyamsundar R, et al. (2005) A DNA microarray survey of gene expression in normal human tissues. Genome Biol 6(3):R22
38.
go back to reference Silva A, Meira W Jr, Zaki M (2010) Structural correlation pattern mining for large graphs. In: Workshop on mining and learning with graphs, pp 119–126 Silva A, Meira W Jr, Zaki M (2010) Structural correlation pattern mining for large graphs. In: Workshop on mining and learning with graphs, pp 119–126
39.
go back to reference Stark C, et al. (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34(suppl 1):D535–D539 Stark C, et al. (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34(suppl 1):D535–D539
40.
go back to reference Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1(1):8 Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1(1):8
41.
go back to reference Wang J, Zeng Z, Zhou L (2006) Clan: an algorithm for mining closed cliques from large dense graph databases. In: ICDE, p 73 Wang J, Zeng Z, Zhou L (2006) Clan: an algorithm for mining closed cliques from large dense graph databases. In: ICDE, p 73
42.
go back to reference Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. In: ICDM, pp 689–692 Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. In: ICDM, pp 689–692
43.
go back to reference Yiu ML, Mamoulis N (2005) Iterative projected clustering by subspace mining. IEEE Trans Knowl Data Eng (TKDE) 17(2):176–189CrossRef Yiu ML, Mamoulis N (2005) Iterative projected clustering by subspace mining. IEEE Trans Knowl Data Eng (TKDE) 17(2):176–189CrossRef
44.
go back to reference Zeeberg B, et al (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4(4):R28 Zeeberg B, et al (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4(4):R28
45.
go back to reference Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: KDD, pp 797–802 Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: KDD, pp 797–802
46.
go back to reference Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. TODS 32(2):13 Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. TODS 32(2):13
47.
go back to reference Zhang S, Yang J, Li S (2009) RING: an integrated method for frequent representative subgraph mining. In: ICDM, pp 1082–1087 Zhang S, Yang J, Li S (2009) RING: an integrated method for frequent representative subgraph mining. In: ICDM, pp 1082–1087
48.
go back to reference Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. In: VLDB, pp 718–729 Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. In: VLDB, pp 718–729
49.
go back to reference Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: ICDM, pp 689–698 Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: ICDM, pp 689–698
Metadata
Title
GAMer: a synthesis of subspace clustering and dense subgraph mining
Authors
Stephan Günnemann
Ines Färber
Brigitte Boden
Thomas Seidl
Publication date
01-08-2014
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 2/2014
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0640-z

Other articles of this Issue 2/2014

Knowledge and Information Systems 2/2014 Go to the issue

Premium Partner