Skip to main content
Erschienen in: Journal of Classification 3/2022

06.07.2022

Community Detection in Feature-Rich Networks Using Data Recovery Approach

verfasst von: Boris Mirkin, Soroosh Shalileh

Erschienen in: Journal of Classification | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. There have been several approaches proposed for that. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abrahao, B., Soundarajan, S., Hopcroft, J., & Kleinberg, R. (2012). On the separability of structural classes of communities. Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 624–632). Abrahao, B., Soundarajan, S., Hopcroft, J., & Kleinberg, R. (2012). On the separability of structural classes of communities. Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 624–632).
Zurück zum Zitat Abu-El-Haija Alipourfard, S., Harutyunyan, N., Kapoor, H., & Perozzi, B. A. (2018). Higher-order graph convolutional layer. Proceedings of the 32nd conference on neural information processing systems (NIPS) (pp. 1–6). Abu-El-Haija Alipourfard, S., Harutyunyan, N., Kapoor, H., & Perozzi, B. A. (2018). Higher-order graph convolutional layer. Proceedings of the 32nd conference on neural information processing systems (NIPS) (pp. 1–6).
Zurück zum Zitat Amorim, R., & Mirkin, B. (2012). Feature weighting and anomalous cluster initializing in k-means clustering minkowski metric. Pattern Recognition, 45(3), 1061–1075.CrossRef Amorim, R., & Mirkin, B. (2012). Feature weighting and anomalous cluster initializing in k-means clustering minkowski metric. Pattern Recognition, 45(3), 1061–1075.CrossRef
Zurück zum Zitat Blondel, V., Guillaume, J., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008(10), 12–20.MATH Blondel, V., Guillaume, J., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008(10), 12–20.MATH
Zurück zum Zitat Bojchevski, A. & Günnemanz, S. (2018). Bayesian robust attributed graph clustering: Joint learning of partial anomalies and group structure. Thirty-second AAAI conference on artificial intelligence (pp. 12–20). Bojchevski, A. & Günnemanz, S. (2018). Bayesian robust attributed graph clustering: Joint learning of partial anomalies and group structure. Thirty-second AAAI conference on artificial intelligence (pp. 12–20).
Zurück zum Zitat Boyd, S., & Vandenberghe, L. (2004). Convex optimization (1st ed.). Cambridge: Cambridge University Press.CrossRefMATH Boyd, S., & Vandenberghe, L. (2004). Convex optimization (1st ed.). Cambridge: Cambridge University Press.CrossRefMATH
Zurück zum Zitat Chang, S., Han, W., Tang, J., Qi, G., Aggarwal, C., & Huang, T. (2015). Heterogeneous network embedding via deep architectures. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 119–128). Chang, S., Han, W., Tang, J., Qi, G., Aggarwal, C., & Huang, T. (2015). Heterogeneous network embedding via deep architectures. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 119–128).
Zurück zum Zitat Cheng, H., Zhou, Y., & Yu, J. (2011). Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(2), 1–33.MathSciNetCrossRef Cheng, H., Zhou, Y., & Yu, J. (2011). Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(2), 1–33.MathSciNetCrossRef
Zurück zum Zitat Cheng, Y. Z. H. & Yu, J. (2010). Clustering large attributed graphs: An efficient incremental approach. IEEE international conference on data mining (pp. 689–698). Cheng, Y. Z. H. & Yu, J. (2010). Clustering large attributed graphs: An efficient incremental approach. IEEE international conference on data mining (pp. 689–698).
Zurück zum Zitat Chiang, M., & Mirkin, B. (2010). Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27(1), 3–40.MathSciNetCrossRefMATH Chiang, M., & Mirkin, B. (2010). Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27(1), 3–40.MathSciNetCrossRefMATH
Zurück zum Zitat Chunaev, P. (2020). Community detection in node-attributed social networks: a survey. Computer Science Review, 100286(37), 1–24.MathSciNetMATH Chunaev, P. (2020). Community detection in node-attributed social networks: a survey. Computer Science Review, 100286(37), 1–24.MathSciNetMATH
Zurück zum Zitat Citraro, S., & Rossetti, G. (2020). Identifying and exploiting homogeneous communities in labeled networks. Applied Network Science, 5(1), 1–203.CrossRef Citraro, S., & Rossetti, G. (2020). Identifying and exploiting homogeneous communities in labeled networks. Applied Network Science, 5(1), 1–203.CrossRef
Zurück zum Zitat Cover, T., & Thomas, J. (2006). Elements of information theory. New York: John Wiley and Sons.MATH Cover, T., & Thomas, J. (2006). Elements of information theory. New York: John Wiley and Sons.MATH
Zurück zum Zitat Cross, R., & Parker, A. (2004). The hidden power of social networks: Understanding how work really gets done in organizations. Cambridge: Harvard Business Press. Cross, R., & Parker, A. (2004). The hidden power of social networks: Understanding how work really gets done in organizations. Cambridge: Harvard Business Press.
Zurück zum Zitat Cruz, J., Bothorel, C. & Poulet, F. (2011). Entropy based community detection in augmented social networks. International conference on computational aspects of social networks (CASoN) (pp. 163–168). IEEE Cruz, J., Bothorel, C. & Poulet, F. (2011). Entropy based community detection in augmented social networks. International conference on computational aspects of social networks (CASoN) (pp. 163–168). IEEE
Zurück zum Zitat Dang, T., & Viennet, E. (2012). Community detection based on structural and attribute similarities. International conference on digital society (pp. 7–12). Dang, T., & Viennet, E. (2012). Community detection based on structural and attribute similarities. International conference on digital society (pp. 7–12).
Zurück zum Zitat Decelle, A., Krzakala, F., Moore, C., & Zdeborova, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review, 84(6), E066106. Decelle, A., Krzakala, F., Moore, C., & Zdeborova, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review, 84(6), E066106.
Zurück zum Zitat Depril, D., Mechelen, I. V., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics & Data Analysis, 52(11), 4923–4938.MathSciNetCrossRefMATH Depril, D., Mechelen, I. V., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics & Data Analysis, 52(11), 4923–4938.MathSciNetCrossRefMATH
Zurück zum Zitat Gao, C., & Ma, Z. (2021). Minimax rates in network analysis: Graphon estimation, community detection and hypothesis testing. Statistical Science, 36(1), 16–33.MathSciNetCrossRefMATH Gao, C., & Ma, Z. (2021). Minimax rates in network analysis: Graphon estimation, community detection and hypothesis testing. Statistical Science, 36(1), 16–33.MathSciNetCrossRefMATH
Zurück zum Zitat Green, P., & Silverman, B. (1993). Nonparametric regression and generalized linear models: a roughness penalty approach (1st ed.). USA: Chapman and Hall/CRC.CrossRefMATH Green, P., & Silverman, B. (1993). Nonparametric regression and generalized linear models: a roughness penalty approach (1st ed.). USA: Chapman and Hall/CRC.CrossRefMATH
Zurück zum Zitat He, D., Jin, D., Chen, Z., & Zhang, W. (2015). Identification of hybrid node and link communities in complex networks. Nature Scientific Reports, 8638. He, D., Jin, D., Chen, Z., & Zhang, W. (2015). Identification of hybrid node and link communities in complex networks. Nature Scientific Reports, 8638.
Zurück zum Zitat Holland, P., Laskey, K., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109–137.MathSciNetCrossRef Holland, P., Laskey, K., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109–137.MathSciNetCrossRef
Zurück zum Zitat Hu, Y., Li, M., Zhang, P., Fan, Y., & Di, Z. (2008). Community detection by signaling on complex networks. Physical Review E, 78(1), 16115.CrossRef Hu, Y., Li, M., Zhang, P., Fan, Y., & Di, Z. (2008). Community detection by signaling on complex networks. Physical Review E, 78(1), 16115.CrossRef
Zurück zum Zitat Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATH Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATH
Zurück zum Zitat Interdonato, R., Atzmueller, M., Gaito, S., Kanawati, R., Largeron, C., & Sala, A. (2019). Feature-rich networks: going beyond complex network topologies. Applied Network Science, 4(1), 1–13.CrossRef Interdonato, R., Atzmueller, M., Gaito, S., Kanawati, R., Largeron, C., & Sala, A. (2019). Feature-rich networks: going beyond complex network topologies. Applied Network Science, 4(1), 1–13.CrossRef
Zurück zum Zitat Jia, C., Li, Y., Carson, M., Wang, X., & Yu, J. (2017). Node attribute-enhanced community detection in complex networks. Scientific Reports, 7(1), 2626.CrossRef Jia, C., Li, Y., Carson, M., Wang, X., & Yu, J. (2017). Node attribute-enhanced community detection in complex networks. Scientific Reports, 7(1), 2626.CrossRef
Zurück zum Zitat Jin, H., Yu, W., & Li, S. (2018). A clustering algorithm for determining community structure in complex networks. Physica A: Statistical Mechanics and its Applications, 492, 980–993.CrossRefMATH Jin, H., Yu, W., & Li, S. (2018). A clustering algorithm for determining community structure in complex networks. Physica A: Statistical Mechanics and its Applications, 492, 980–993.CrossRefMATH
Zurück zum Zitat Karger, D. (1993). Global min-cuts in rnc, and other ramifications of a simple min-cut algorithm. SODA (pp. 21–30). Karger, D. (1993). Global min-cuts in rnc, and other ramifications of a simple min-cut algorithm. SODA (pp. 21–30).
Zurück zum Zitat Kovaleva, E., & Mirkin, B. (2015). Bisecting k-means and 1d projection divisive clustering: A unified framework and experimental comparison. Journal of Classification, 32(3), 414–442.MathSciNetCrossRefMATH Kovaleva, E., & Mirkin, B. (2015). Bisecting k-means and 1d projection divisive clustering: A unified framework and experimental comparison. Journal of Classification, 32(3), 414–442.MathSciNetCrossRefMATH
Zurück zum Zitat Larremore, D. B., Clauset, A., & Buckee, C. O. (2013). Network approach to analyzing highly recombinant malaria parasite genes. PLoS Computational Biology, 9(10), e1003268.CrossRef Larremore, D. B., Clauset, A., & Buckee, C. O. (2013). Network approach to analyzing highly recombinant malaria parasite genes. PLoS Computational Biology, 9(10), e1003268.CrossRef
Zurück zum Zitat Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. New York: Oxford University Press.CrossRef Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. New York: Oxford University Press.CrossRef
Zurück zum Zitat Li, J., Rong, Y., Cheng, H., Meng, H., Huang, W. & Huang, J. (2019). Semi-supervised graph classification: A hierarchical graph perspective. The World Wide Web conference(ACM) (pp. 972–982). Li, J., Rong, Y., Cheng, H., Meng, H., Huang, W. & Huang, J. (2019). Semi-supervised graph classification: A hierarchical graph perspective. The World Wide Web conference(ACM) (pp. 972–982).
Zurück zum Zitat Li, Y., Jia, C., & Yu, J. (2015). Parameter-free community detection method based on centrality and dispersion of nodes in complex networks. Physica A-Statistical Mechanics and Its Applications, 438, 321–334.CrossRef Li, Y., Jia, C., & Yu, J. (2015). Parameter-free community detection method based on centrality and dispersion of nodes in complex networks. Physica A-Statistical Mechanics and Its Applications, 438, 321–334.CrossRef
Zurück zum Zitat Mirkin, B. (1987). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4(1), 7–31.MathSciNetCrossRefMATH Mirkin, B. (1987). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4(1), 7–31.MathSciNetCrossRefMATH
Zurück zum Zitat Mirkin, B. (2012). Clustering: A data recovery approach (2nd ed.). Boca Raton: CRC Press.MATH Mirkin, B. (2012). Clustering: A data recovery approach (2nd ed.). Boca Raton: CRC Press.MATH
Zurück zum Zitat Mirkin, B., & Nascimento, S. (2012). Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Information Sciences, 183(1), 16–34.CrossRef Mirkin, B., & Nascimento, S. (2012). Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Information Sciences, 183(1), 16–34.CrossRef
Zurück zum Zitat Monge, A., & Elkan, C. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records. Monge, A., & Elkan, C. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records.
Zurück zum Zitat Neville, J., Adler, M., & Jensen, D. (2003). Clustering relational data using attribute and link information. Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence (pp. 9–15). Neville, J., Adler, M., & Jensen, D. (2003). Clustering relational data using attribute and link information. Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence (pp. 9–15).
Zurück zum Zitat Newman, M. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582.CrossRef Newman, M. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582.CrossRef
Zurück zum Zitat Newman, M., & Clauset, A. (2016). Structure and inference in annotated networks. Nature Communications, 7(1), 1–11.CrossRef Newman, M., & Clauset, A. (2016). Structure and inference in annotated networks. Nature Communications, 7(1), 1–11.CrossRef
Zurück zum Zitat Nooy, W. D., Mrvar, A., & Batagelj, V. (2004). Exploratory social network analysis with pajek. Cambridge: Cambridge University Press. Nooy, W. D., Mrvar, A., & Batagelj, V. (2004). Exploratory social network analysis with pajek. Cambridge: Cambridge University Press.
Zurück zum Zitat Nowicki, K., & Snijders, T. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077–1087.MathSciNetCrossRefMATH Nowicki, K., & Snijders, T. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077–1087.MathSciNetCrossRefMATH
Zurück zum Zitat Page, L., & S.B.R.M.T.W. (1999). Pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab. Page, L., & S.B.R.M.T.W. (1999). Pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab.
Zurück zum Zitat Peel, L., Larremore, D., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.CrossRef Peel, L., Larremore, D., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.CrossRef
Zurück zum Zitat Raghavan, U., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3), 036106.CrossRef Raghavan, U., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3), 036106.CrossRef
Zurück zum Zitat Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRef Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRef
Zurück zum Zitat Shi, W., Huang, L., Li, J. H., Wang, C., Tang, Y., & Fu, C. (2019). Network embedding via community based variational autoencoder. IEEE Access, 7, 25323–25333.CrossRef Shi, W., Huang, L., Li, J. H., Wang, C., Tang, Y., & Fu, C. (2019). Network embedding via community based variational autoencoder. IEEE Access, 7, 25323–25333.CrossRef
Zurück zum Zitat Stanley, N., Bonacci, T., Kwitt, R., Niethammer, M., & Mucha, P. (2019). Stochastic block models with multiple continuous attributes. Applied Network Science, 4(1), 1–22.CrossRef Stanley, N., Bonacci, T., Kwitt, R., Niethammer, M., & Mucha, P. (2019). Stochastic block models with multiple continuous attributes. Applied Network Science, 4(1), 1–22.CrossRef
Zurück zum Zitat Steinhaeuser, K., & Chawla, N. (2008). Community detection in a large real-world social network. In social computing, behavioral modeling, and prediction (pp. 168–175). Steinhaeuser, K., & Chawla, N. (2008). Community detection in a large real-world social network. In social computing, behavioral modeling, and prediction (pp. 168–175).
Zurück zum Zitat Wang, D., & Zhao, Y. (2019). Network community detection from the perspective of time series. Physica A: Statistical Mechanics and its Applications, 522, 205–214.CrossRef Wang, D., & Zhao, Y. (2019). Network community detection from the perspective of time series. Physica A: Statistical Mechanics and its Applications, 522, 205–214.CrossRef
Zurück zum Zitat Xu, Z., Ke, Y., Wang, Y., Cheng, H., & Cheng, J. (2012). A model-based approach to attributed graph clustering. Proceedings of the 2012 ACM SIGMOD international conference on management of data (pp. 505–516). ACM Xu, Z., Ke, Y., Wang, Y., Cheng, H., & Cheng, J. (2012). A model-based approach to attributed graph clustering. Proceedings of the 2012 ACM SIGMOD international conference on management of data (pp. 505–516). ACM
Zurück zum Zitat Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes. IEEE 13th international conference on data mining (pp. 1151–1156). Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes. IEEE 13th international conference on data mining (pp. 1151–1156).
Zurück zum Zitat Yin, Z., Gupta, M., Weninger, T., & Han, J. (2010). A unified framework for link recommendation using random walks. 2010 international conference on advances in social networks analysis and mining (pp. 152–159). IEEE Yin, Z., Gupta, M., Weninger, T., & Han, J. (2010). A unified framework for link recommendation using random walks. 2010 international conference on advances in social networks analysis and mining (pp. 152–159). IEEE
Zurück zum Zitat Zhang, Y., Levina, E., & Zhu, J. (2016). Community detection in networks with node features. Electronic Journal of Statistics, 10(2), 3153–3178.MathSciNetCrossRefMATH Zhang, Y., Levina, E., & Zhu, J. (2016). Community detection in networks with node features. Electronic Journal of Statistics, 10(2), 3153–3178.MathSciNetCrossRefMATH
Metadaten
Titel
Community Detection in Feature-Rich Networks Using Data Recovery Approach
verfasst von
Boris Mirkin
Soroosh Shalileh
Publikationsdatum
06.07.2022
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 3/2022
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-022-09416-w

Weitere Artikel der Ausgabe 3/2022

Journal of Classification 3/2022 Zur Ausgabe

Premium Partner