nach oben

Knowledge and Information Systems

Erschienen in:

01.02.2016 | Regular Paper

HICC: an entropy splitting-based framework for hierarchical co-clustering

verfasst von: Wei Cheng, Xiang Zhang, Feng Pan, Wei Wang

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Two-dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchical co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm, HICC, with solid theoretical background. It simultaneously constructs a hierarchical structure of both row and column clusters, which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed, which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on both synthetic and real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms. Moreover, the experiments on real dataset show that HICC can effectively reveal hidden relationships between rows and columns in the contingency table.

Vorheriger Artikel Multiple task transfer learning with small sample sizes

Nächster Artikel POI recommendation through cross-region collaborative filtering

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html.

Ailon N, Charikar M (2005) Fitting tree metrics: hierarchical clustering and phylogeny. In: FOCS: IEEE symposium on foundations of computer science (FOCS)

Anagnostopoulos A et al (2008) Approximation algorithms for co-clustering. In: Lenzerini M, Lembo D (eds.) Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems: PODS’08, Vancouver, BC, Canada, 9–11 June 2008, pp 201–210. ACM Press

Bandyopadhyay S, Coyle EJ (2003) An energy efficient hierarchical clustering algorithm for wireless sensor networks. In: INFOCOM 2003

Banerjee A et al (2004) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In: SIGKDD’04 conference proceedings

Banerjee A et al (2007) A generalized maximum entropy approach to bregman co-clustering and matrix approximation

Brunet JP et al (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA 101(12):4164–4169CrossRef

Chakrabarti D et al (2004) Fully automatic cross-associations. In: ACM SIGKDD’04 conference proceedings

Choi DS, Wolfe PJ (2012) Co-clustering separately exchangeable network data. arXiv:1212.4093 CoRR

Deodhar M, Ghosh J (2010) SCOAL: a framework for simultaneous co-clustering and learning from complex data. ACM Trans Knowl Discov Data 4(3):11:1–11:31

10.

Dhillon IS et al (2003a) A divisive information-theoretic feature clustering algorithm for text classification. J Mach Learn Res 3:1265–1287MathSciNetMATH

11.

Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: SIGKDD ’01 conference proceedings

12.

Dhillon IS et al (2003b) Information-theoretic co-clustering. In: SIGKDD ’03 conference proceedings

13.

El-Yaniv R, Souroujon O (2001) Iterative double clustering for unsupervised and semi-supervised learning. In: ECML, pp 121–132

14.

Goldberger J, Roweis ST (2004) Hierarchical clustering of a mixture model. In: NIPS

15.

Hochreiter S et al (2010) FABIA: factor analysis for bicluster acquisition. Bioinformatics 26(12):1520–1527CrossRef

16.

Hosseini M, Abolhassani H (2007) Hierarchical co-clustering for web queries and selected urls. In: Web information systems engineering-WISE 2007, pp 653–662

17.

Ienco RPD, Meo R (2009) Parameter-free hierarchical co-clustering by n-ary splits. Mach Learn Knowl Discov Databases 5781:580–595

18.

Karayannidis N, Sellis T (2008) Hierarchical clustering for OLAP: the CUBE file approach. VLDB J Very Large Data Bases 17(4):621–655CrossRef

19.

Lee D, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791CrossRef

20.

Li J et al (2011) Hierarchical co-clustering: a new way to organize the music data. IEEE Trans Multimed 14:1–25MATH

21.

Li J, Li T (2010) HCC: a hierarchical co-clustering algorithm. In: SIGIR’10, pp 861–862

22.

Long B et al (2005a) Co-clustering by block value decomposition. In: SIGKDD’05 conference proceedings

23.

Long B et al (2005b) Co-clustering by block value decomposition. In: KDD, pp 635–640. ACM

24.

Mishra N et al (2003) On finding large conjunctive clusters. In: COLT’03 conference proceedings

25.

Murtach F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4): 354–359CrossRef

26.

Olson CF (1995) Parallel algorithms for hierarchical clustering. PARCOMP Parallel Comput 21: 1313–1325MathSciNetCrossRefMATH

27.

Pan F et al (2008) CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition. In: SIGMOD conference, pp 173–184. ACM

28.

Pensa RG, Lenco D, Meo R (2014) Hierarchical co-clustering: off-line and incremental approaches. Data Min Knowl Discov 28(1):31–64

29.

Pensa RG, Boulicaut J-F (2008) Constrained co-clustering of gene expression data. In: SDM, pp 25–36. SIAM

30.

Schkolnick M (1977) Clustering algorithm for hierarchical structures. ACM Trans Database Syst 2(1):27MathSciNetCrossRef

31.

Segal E, Koller D (2002) Probabilistic hierarchical clustering for biological data. In: RECOMB, pp 273–280

32.

Shan H, Banerjee A (2008) Bayesian co-clustering. In: ICDM, pp 530–539. IEEE Computer Society

33.

Shao B et al (2008) Quantify music artist similarity based on style and mood. In: WIDM ’08, pp 119–124

34.

Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: ACM SIGIR

35.

Song Y et al (2013) Constrained text coclustering with supervised and unsupervised constraints. IEEE Trans Knowl Data Eng 25:1227–1239CrossRef

36.

Székely GJ, Rizzo ML (2005) Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method. J Classif 22(2):151–183CrossRef

37.

Wang P et al (2011) Nonparametric Bayesian co-clustering ensembles. In: SDM, pp 331–342. SIAM/Omnipress

38.

Wu M-L et al (2013) Co-clustering with augmented matrix. Appl Intell 39(1):153–164CrossRef

39.

Xu G, Ma WY (2006) Building implicit links from content for forum search. In: SIGIR ’06, pp 300–307

40.

Zhang L (2012) Locally discriminative coclustering. IEEE Trans Knowl Data Eng 24(6):1025–1035CrossRef

Titel: HICC: an entropy splitting-based framework for hierarchical co-clustering
verfasst von: Wei Cheng
Xiang Zhang
Feng Pan
Wei Wang
Publikationsdatum: 01.02.2016
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 2/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-015-0823-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 2/2016

Incremental mining of temporal patterns in interval-based database

Multiple task transfer learning with small sample sizes

Minimizing response time in time series classification

A survey on indexing techniques for big data: taxonomy and performance evaluation

Efficient entity resolution based on subgraph cohesion

POI recommendation through cross-region collaborative filtering

Premium Partner