Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 4/2019

10.04.2019

Clustering for heterogeneous information networks with extended star-structure

verfasst von: Jian-Ping Mei, Huajiang Lv, Lianghuai Yang, Yanjun Li

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering of objects in a heterogeneous information network, where different types of objects are linked to each other, is an important problem in heterogeneous information network analysis. Several existing clustering approaches deal with star-structured information networks with different central-attribute relations. In real applications, homogeneous links between central objects may also be available and useful for clustering. In this paper, we propose a new approach called CluEstar for clustering of network with an extended star-structure (E-Star), which extends the classic star-structure by further including central–central relation, i.e., links between objects of the central type. In CluEstar, all objects have a ranking with respect to each cluster to reflect their within-cluster representativeness and determine the clusters of objects that they linked to. A novel objective function is proposed for clustering of E-Star network by formulating both central-attribute and central–central links in an efficient way. Results of extensive experimental studies with benchmark data sets show that the proposed approach is more favorable than existing ones for clustering of E-Star networks with high quality and good efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Abdelsadek Y, Chelghoum K, Herrmanna F, Kacem I, Otjacques B (2018) Community extraction and visualization in social networks applied to twitter. Inf Sci 424:204–223CrossRef Abdelsadek Y, Chelghoum K, Herrmanna F, Kacem I, Otjacques B (2018) Community extraction and visualization in social networks applied to twitter. Inf Sci 424:204–223CrossRef
Zurück zum Zitat Banerjee A, Dhillon I, Ghosh J, Meruguand S, Modha DS (2004) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 509–514 Banerjee A, Dhillon I, Ghosh J, Meruguand S, Modha DS (2004) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 509–514
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH
Zurück zum Zitat Chen J, Yuan B (2006) Detecting functional modules in the yeast protein–protein interaction network. Bioinformatics 22:2283–2290CrossRef Chen J, Yuan B (2006) Detecting functional modules in the yeast protein–protein interaction network. Bioinformatics 22:2283–2290CrossRef
Zurück zum Zitat Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semi-supervised heterogeneous data coclustering. IEEE Trans Knowl Data Eng 22(10):1459–1474CrossRef Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semi-supervised heterogeneous data coclustering. IEEE Trans Knowl Data Eng 22(10):1459–1474CrossRef
Zurück zum Zitat Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 89–98 Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 89–98
Zurück zum Zitat Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 269–274 Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 269–274
Zurück zum Zitat Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175CrossRefMATH Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175CrossRefMATH
Zurück zum Zitat Ding CHQ, He X, Zha H, Gu M, Simon HD (2001) A min–max cut algorithm for graph partitioning and data clustering. In: Proceedings of IEEE international conference on data mining, pp 107–114 Ding CHQ, He X, Zha H, Gu M, Simon HD (2001) A min–max cut algorithm for graph partitioning and data clustering. In: Proceedings of IEEE international conference on data mining, pp 107–114
Zurück zum Zitat Gao B, Liu T-Y, Zheng X, Cheng Q-S, Ma W-Y (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 41–50 Gao B, Liu T-Y, Zheng X, Cheng Q-S, Ma W-Y (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 41–50
Zurück zum Zitat Guo Z, Zhu S, Chi Y, Zhang Z, Gong Y (2009) A latent topic model for linked documents. In: Proceedings of international conference on research and development in information retrieval, pp 720–721 Guo Z, Zhu S, Chi Y, Zhang Z, Gong Y (2009) A latent topic model for linked documents. In: Proceedings of international conference on research and development in information retrieval, pp 720–721
Zurück zum Zitat Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 359–368 Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 359–368
Zurück zum Zitat Hofmann T (1999) Probabilistic latent semantic analysis. In: Conference on uncertainty in artificial intelligence, pp 289–296 Hofmann T (1999) Probabilistic latent semantic analysis. In: Conference on uncertainty in artificial intelligence, pp 289–296
Zurück zum Zitat Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 1507–1515 Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 1507–1515
Zurück zum Zitat Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26(2):217–254MathSciNetCrossRefMATH Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26(2):217–254MathSciNetCrossRefMATH
Zurück zum Zitat Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: Proceedings of European conference on machine learning and data mining, pp 570–586 Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: Proceedings of European conference on machine learning and data mining, pp 570–586
Zurück zum Zitat Krishnamurthy B, Wang J (2000) On network-aware clustering of web clients. SIGCOMM Comput Commun Rev 30:97–110CrossRef Krishnamurthy B, Wang J (2000) On network-aware clustering of web clients. SIGCOMM Comput Commun Rev 30:97–110CrossRef
Zurück zum Zitat Kummamuru K, Dhawale A, Krishnapuram R (2003) Fuzzy co-clustering of documents and keywords. In: Proceedings of the 12th IEEE international conference on fuzzy systems, pp 772–777 Kummamuru K, Dhawale A, Krishnapuram R (2003) Fuzzy co-clustering of documents and keywords. In: Proceedings of the 12th IEEE international conference on fuzzy systems, pp 772–777
Zurück zum Zitat Lin W, Yu PS, Zhao Y, Deng B (2016) Multi-type clustering in heterogeneous information networks. Knowl Inf Syst 48(1):143–178CrossRef Lin W, Yu PS, Zhao Y, Deng B (2016) Multi-type clustering in heterogeneous information networks. Knowl Inf Syst 48(1):143–178CrossRef
Zurück zum Zitat Long B, Zhang Z, Wu X, Yu PS (2006a) Spectral clustering for multi-type relational data. In: Proceedings of 23th international conference on machine learning, pp 585–592 Long B, Zhang Z, Wu X, Yu PS (2006a) Spectral clustering for multi-type relational data. In: Proceedings of 23th international conference on machine learning, pp 585–592
Zurück zum Zitat Long B, Wu X, Zhang Z, Yu PS (2006b) Unsupervised learning on k-partite graphs. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 317–326 Long B, Wu X, Zhang Z, Yu PS (2006b) Unsupervised learning on k-partite graphs. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 317–326
Zurück zum Zitat Long B, Zhang Z, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 470–479 Long B, Zhang Z, Yu PS (2007) A probabilistic framework for relational clustering. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 470–479
Zurück zum Zitat Long B, Zhang Z, Yu PS (2010) A general framework for relation graph clustering. Knowl Inf Syst 24:393–413CrossRef Long B, Zhang Z, Yu PS (2010) A general framework for relation graph clustering. Knowl Inf Syst 24:393–413CrossRef
Zurück zum Zitat McCallum A, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr 3(2):127–163CrossRef McCallum A, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr 3(2):127–163CrossRef
Zurück zum Zitat Mei J-P, Chen L (2010) Fuzzy clustering with weighted medoids for relational data. Pattern Recognit 43:1964–1974CrossRefMATH Mei J-P, Chen L (2010) Fuzzy clustering with weighted medoids for relational data. Pattern Recognit 43:1964–1974CrossRefMATH
Zurück zum Zitat Mei J-P, Chen L (2011) Fuzzy clustering approach for star-structured multi-type relational data. In: IEEE international conference on fuzzy systems, pp 2500–2506 Mei J-P, Chen L (2011) Fuzzy clustering approach for star-structured multi-type relational data. In: IEEE international conference on fuzzy systems, pp 2500–2506
Zurück zum Zitat Mei J-P, Chen L (2012) A fuzzy approach for multitype relational data clustering. IEEE Trans Fuzzy Syst 20:358–371CrossRef Mei J-P, Chen L (2012) A fuzzy approach for multitype relational data clustering. IEEE Trans Fuzzy Syst 20:358–371CrossRef
Zurück zum Zitat Mei Q, Cai D, Zhang D, Zhai CX (2008) Topic modeling with network regularization. In: Proceedings of international world wide web conference, pp 101–110 Mei Q, Cai D, Zhang D, Zhai CX (2008) Topic modeling with network regularization. In: Proceedings of international world wide web conference, pp 101–110
Zurück zum Zitat Mei J-P, Kwoh C-K, Yang P, Li X-L, Zheng J (2013) Drugtarget interaction prediction by learning from local information and neighbors. Bioinformatics 29(2):238–245CrossRef Mei J-P, Kwoh C-K, Yang P, Li X-L, Zheng J (2013) Drugtarget interaction prediction by learning from local information and neighbors. Bioinformatics 29(2):238–245CrossRef
Zurück zum Zitat Miyamoto S, Umayahara K (1998) Fuzzy clustering by quadratic regularization. In: IEEE international conference on fuzzy systems, pp 1394–1399 Miyamoto S, Umayahara K (1998) Fuzzy clustering by quadratic regularization. In: IEEE international conference on fuzzy systems, pp 1394–1399
Zurück zum Zitat Pio G, Serafino F, Malerba D, Ceci M (2018) Multi-type clustering and classification from heterogeneous networks. Inf Sci 425:107–126MathSciNetCrossRef Pio G, Serafino F, Malerba D, Ceci M (2018) Multi-type clustering and classification from heterogeneous networks. Inf Sci 425:107–126MathSciNetCrossRef
Zurück zum Zitat Shafiei MM, Milios EE (2006) Latent Dirichlet co-clustering. In: Proceedings of IEEE international conference on data mining, pp 542–551 Shafiei MM, Milios EE (2006) Latent Dirichlet co-clustering. In: Proceedings of IEEE international conference on data mining, pp 542–551
Zurück zum Zitat Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29:17–37CrossRef Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29:17–37CrossRef
Zurück zum Zitat Shi Y, Zhu Q, Guo F, Zhang C, Han J (2018) Easing embedding learning by comprehensive transcription of heterogeneous information networks. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 2190–2199 Shi Y, Zhu Q, Guo F, Zhang C, Han J (2018) Easing embedding learning by comprehensive transcription of heterogeneous information networks. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 2190–2199
Zurück zum Zitat Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH
Zurück zum Zitat Sun Y, Han J, Gao J, Yu Y (2009a) itopicmodel: Information network-integrated topic modeling. In: Proceedings of IEEE international conference on data mining, pp 493–502 Sun Y, Han J, Gao J, Yu Y (2009a) itopicmodel: Information network-integrated topic modeling. In: Proceedings of IEEE international conference on data mining, pp 493–502
Zurück zum Zitat Sun Y, Yu Y, Han J (2009b) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 797–806 Sun Y, Yu Y, Han J (2009b) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of ACM international conference on knowledge discovery and data mining, pp 797–806
Zurück zum Zitat Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of international conference on research and development in information retrieval, pp 267–273 Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of international conference on research and development in information retrieval, pp 267–273
Zurück zum Zitat Yamanishi Y, Araki M, Gutteridge A (2008) Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24:i232–i240CrossRef Yamanishi Y, Araki M, Gutteridge A (2008) Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24:i232–i240CrossRef
Zurück zum Zitat Zhang D, Wang F, Zhang C, Li T (2008) Multi-view local learning. In: Proceedings of AAAI conference on artificial intelligence, pp 752–757 Zhang D, Wang F, Zhang C, Li T (2008) Multi-view local learning. In: Proceedings of AAAI conference on artificial intelligence, pp 752–757
Zurück zum Zitat Zhu S, Yu K, Chi Y, Gong Y (2007) Combining content and link for classification using matrix factorization. In: Proceedings of international conference on research and development in information retrieval, pp 487–494 Zhu S, Yu K, Chi Y, Gong Y (2007) Combining content and link for classification using matrix factorization. In: Proceedings of international conference on research and development in information retrieval, pp 487–494
Metadaten
Titel
Clustering for heterogeneous information networks with extended star-structure
verfasst von
Jian-Ping Mei
Huajiang Lv
Lianghuai Yang
Yanjun Li
Publikationsdatum
10.04.2019
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 4/2019
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-019-00626-2

Weitere Artikel der Ausgabe 4/2019

Data Mining and Knowledge Discovery 4/2019 Zur Ausgabe

Premium Partner