Skip to main content
Top
Published in: Journal of Classification 3/2022

06-07-2022

Community Detection in Feature-Rich Networks Using Data Recovery Approach

Authors: Boris Mirkin, Soroosh Shalileh

Published in: Journal of Classification | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. There have been several approaches proposed for that. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abrahao, B., Soundarajan, S., Hopcroft, J., & Kleinberg, R. (2012). On the separability of structural classes of communities. Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 624–632). Abrahao, B., Soundarajan, S., Hopcroft, J., & Kleinberg, R. (2012). On the separability of structural classes of communities. Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 624–632).
go back to reference Abu-El-Haija Alipourfard, S., Harutyunyan, N., Kapoor, H., & Perozzi, B. A. (2018). Higher-order graph convolutional layer. Proceedings of the 32nd conference on neural information processing systems (NIPS) (pp. 1–6). Abu-El-Haija Alipourfard, S., Harutyunyan, N., Kapoor, H., & Perozzi, B. A. (2018). Higher-order graph convolutional layer. Proceedings of the 32nd conference on neural information processing systems (NIPS) (pp. 1–6).
go back to reference Amorim, R., & Mirkin, B. (2012). Feature weighting and anomalous cluster initializing in k-means clustering minkowski metric. Pattern Recognition, 45(3), 1061–1075.CrossRef Amorim, R., & Mirkin, B. (2012). Feature weighting and anomalous cluster initializing in k-means clustering minkowski metric. Pattern Recognition, 45(3), 1061–1075.CrossRef
go back to reference Blondel, V., Guillaume, J., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008(10), 12–20.MATH Blondel, V., Guillaume, J., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008(10), 12–20.MATH
go back to reference Bojchevski, A. & Günnemanz, S. (2018). Bayesian robust attributed graph clustering: Joint learning of partial anomalies and group structure. Thirty-second AAAI conference on artificial intelligence (pp. 12–20). Bojchevski, A. & Günnemanz, S. (2018). Bayesian robust attributed graph clustering: Joint learning of partial anomalies and group structure. Thirty-second AAAI conference on artificial intelligence (pp. 12–20).
go back to reference Boyd, S., & Vandenberghe, L. (2004). Convex optimization (1st ed.). Cambridge: Cambridge University Press.CrossRefMATH Boyd, S., & Vandenberghe, L. (2004). Convex optimization (1st ed.). Cambridge: Cambridge University Press.CrossRefMATH
go back to reference Chang, S., Han, W., Tang, J., Qi, G., Aggarwal, C., & Huang, T. (2015). Heterogeneous network embedding via deep architectures. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 119–128). Chang, S., Han, W., Tang, J., Qi, G., Aggarwal, C., & Huang, T. (2015). Heterogeneous network embedding via deep architectures. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 119–128).
go back to reference Cheng, H., Zhou, Y., & Yu, J. (2011). Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(2), 1–33.MathSciNetCrossRef Cheng, H., Zhou, Y., & Yu, J. (2011). Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(2), 1–33.MathSciNetCrossRef
go back to reference Cheng, Y. Z. H. & Yu, J. (2010). Clustering large attributed graphs: An efficient incremental approach. IEEE international conference on data mining (pp. 689–698). Cheng, Y. Z. H. & Yu, J. (2010). Clustering large attributed graphs: An efficient incremental approach. IEEE international conference on data mining (pp. 689–698).
go back to reference Chiang, M., & Mirkin, B. (2010). Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27(1), 3–40.MathSciNetCrossRefMATH Chiang, M., & Mirkin, B. (2010). Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27(1), 3–40.MathSciNetCrossRefMATH
go back to reference Chunaev, P. (2020). Community detection in node-attributed social networks: a survey. Computer Science Review, 100286(37), 1–24.MathSciNetMATH Chunaev, P. (2020). Community detection in node-attributed social networks: a survey. Computer Science Review, 100286(37), 1–24.MathSciNetMATH
go back to reference Citraro, S., & Rossetti, G. (2020). Identifying and exploiting homogeneous communities in labeled networks. Applied Network Science, 5(1), 1–203.CrossRef Citraro, S., & Rossetti, G. (2020). Identifying and exploiting homogeneous communities in labeled networks. Applied Network Science, 5(1), 1–203.CrossRef
go back to reference Cover, T., & Thomas, J. (2006). Elements of information theory. New York: John Wiley and Sons.MATH Cover, T., & Thomas, J. (2006). Elements of information theory. New York: John Wiley and Sons.MATH
go back to reference Cross, R., & Parker, A. (2004). The hidden power of social networks: Understanding how work really gets done in organizations. Cambridge: Harvard Business Press. Cross, R., & Parker, A. (2004). The hidden power of social networks: Understanding how work really gets done in organizations. Cambridge: Harvard Business Press.
go back to reference Cruz, J., Bothorel, C. & Poulet, F. (2011). Entropy based community detection in augmented social networks. International conference on computational aspects of social networks (CASoN) (pp. 163–168). IEEE Cruz, J., Bothorel, C. & Poulet, F. (2011). Entropy based community detection in augmented social networks. International conference on computational aspects of social networks (CASoN) (pp. 163–168). IEEE
go back to reference Dang, T., & Viennet, E. (2012). Community detection based on structural and attribute similarities. International conference on digital society (pp. 7–12). Dang, T., & Viennet, E. (2012). Community detection based on structural and attribute similarities. International conference on digital society (pp. 7–12).
go back to reference Decelle, A., Krzakala, F., Moore, C., & Zdeborova, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review, 84(6), E066106. Decelle, A., Krzakala, F., Moore, C., & Zdeborova, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review, 84(6), E066106.
go back to reference Depril, D., Mechelen, I. V., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics & Data Analysis, 52(11), 4923–4938.MathSciNetCrossRefMATH Depril, D., Mechelen, I. V., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics & Data Analysis, 52(11), 4923–4938.MathSciNetCrossRefMATH
go back to reference Gao, C., & Ma, Z. (2021). Minimax rates in network analysis: Graphon estimation, community detection and hypothesis testing. Statistical Science, 36(1), 16–33.MathSciNetCrossRefMATH Gao, C., & Ma, Z. (2021). Minimax rates in network analysis: Graphon estimation, community detection and hypothesis testing. Statistical Science, 36(1), 16–33.MathSciNetCrossRefMATH
go back to reference Green, P., & Silverman, B. (1993). Nonparametric regression and generalized linear models: a roughness penalty approach (1st ed.). USA: Chapman and Hall/CRC.CrossRefMATH Green, P., & Silverman, B. (1993). Nonparametric regression and generalized linear models: a roughness penalty approach (1st ed.). USA: Chapman and Hall/CRC.CrossRefMATH
go back to reference He, D., Jin, D., Chen, Z., & Zhang, W. (2015). Identification of hybrid node and link communities in complex networks. Nature Scientific Reports, 8638. He, D., Jin, D., Chen, Z., & Zhang, W. (2015). Identification of hybrid node and link communities in complex networks. Nature Scientific Reports, 8638.
go back to reference Holland, P., Laskey, K., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109–137.MathSciNetCrossRef Holland, P., Laskey, K., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109–137.MathSciNetCrossRef
go back to reference Hu, Y., Li, M., Zhang, P., Fan, Y., & Di, Z. (2008). Community detection by signaling on complex networks. Physical Review E, 78(1), 16115.CrossRef Hu, Y., Li, M., Zhang, P., Fan, Y., & Di, Z. (2008). Community detection by signaling on complex networks. Physical Review E, 78(1), 16115.CrossRef
go back to reference Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATH Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefMATH
go back to reference Interdonato, R., Atzmueller, M., Gaito, S., Kanawati, R., Largeron, C., & Sala, A. (2019). Feature-rich networks: going beyond complex network topologies. Applied Network Science, 4(1), 1–13.CrossRef Interdonato, R., Atzmueller, M., Gaito, S., Kanawati, R., Largeron, C., & Sala, A. (2019). Feature-rich networks: going beyond complex network topologies. Applied Network Science, 4(1), 1–13.CrossRef
go back to reference Jia, C., Li, Y., Carson, M., Wang, X., & Yu, J. (2017). Node attribute-enhanced community detection in complex networks. Scientific Reports, 7(1), 2626.CrossRef Jia, C., Li, Y., Carson, M., Wang, X., & Yu, J. (2017). Node attribute-enhanced community detection in complex networks. Scientific Reports, 7(1), 2626.CrossRef
go back to reference Jin, H., Yu, W., & Li, S. (2018). A clustering algorithm for determining community structure in complex networks. Physica A: Statistical Mechanics and its Applications, 492, 980–993.CrossRefMATH Jin, H., Yu, W., & Li, S. (2018). A clustering algorithm for determining community structure in complex networks. Physica A: Statistical Mechanics and its Applications, 492, 980–993.CrossRefMATH
go back to reference Karger, D. (1993). Global min-cuts in rnc, and other ramifications of a simple min-cut algorithm. SODA (pp. 21–30). Karger, D. (1993). Global min-cuts in rnc, and other ramifications of a simple min-cut algorithm. SODA (pp. 21–30).
go back to reference Kovaleva, E., & Mirkin, B. (2015). Bisecting k-means and 1d projection divisive clustering: A unified framework and experimental comparison. Journal of Classification, 32(3), 414–442.MathSciNetCrossRefMATH Kovaleva, E., & Mirkin, B. (2015). Bisecting k-means and 1d projection divisive clustering: A unified framework and experimental comparison. Journal of Classification, 32(3), 414–442.MathSciNetCrossRefMATH
go back to reference Larremore, D. B., Clauset, A., & Buckee, C. O. (2013). Network approach to analyzing highly recombinant malaria parasite genes. PLoS Computational Biology, 9(10), e1003268.CrossRef Larremore, D. B., Clauset, A., & Buckee, C. O. (2013). Network approach to analyzing highly recombinant malaria parasite genes. PLoS Computational Biology, 9(10), e1003268.CrossRef
go back to reference Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. New York: Oxford University Press.CrossRef Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. New York: Oxford University Press.CrossRef
go back to reference Li, J., Rong, Y., Cheng, H., Meng, H., Huang, W. & Huang, J. (2019). Semi-supervised graph classification: A hierarchical graph perspective. The World Wide Web conference(ACM) (pp. 972–982). Li, J., Rong, Y., Cheng, H., Meng, H., Huang, W. & Huang, J. (2019). Semi-supervised graph classification: A hierarchical graph perspective. The World Wide Web conference(ACM) (pp. 972–982).
go back to reference Li, Y., Jia, C., & Yu, J. (2015). Parameter-free community detection method based on centrality and dispersion of nodes in complex networks. Physica A-Statistical Mechanics and Its Applications, 438, 321–334.CrossRef Li, Y., Jia, C., & Yu, J. (2015). Parameter-free community detection method based on centrality and dispersion of nodes in complex networks. Physica A-Statistical Mechanics and Its Applications, 438, 321–334.CrossRef
go back to reference Mirkin, B. (1987). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4(1), 7–31.MathSciNetCrossRefMATH Mirkin, B. (1987). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4(1), 7–31.MathSciNetCrossRefMATH
go back to reference Mirkin, B. (2012). Clustering: A data recovery approach (2nd ed.). Boca Raton: CRC Press.MATH Mirkin, B. (2012). Clustering: A data recovery approach (2nd ed.). Boca Raton: CRC Press.MATH
go back to reference Mirkin, B., & Nascimento, S. (2012). Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Information Sciences, 183(1), 16–34.CrossRef Mirkin, B., & Nascimento, S. (2012). Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Information Sciences, 183(1), 16–34.CrossRef
go back to reference Monge, A., & Elkan, C. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records. Monge, A., & Elkan, C. (1997). An efficient domain-independent algorithm for detecting approximately duplicate database records.
go back to reference Neville, J., Adler, M., & Jensen, D. (2003). Clustering relational data using attribute and link information. Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence (pp. 9–15). Neville, J., Adler, M., & Jensen, D. (2003). Clustering relational data using attribute and link information. Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence (pp. 9–15).
go back to reference Newman, M. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582.CrossRef Newman, M. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582.CrossRef
go back to reference Newman, M., & Clauset, A. (2016). Structure and inference in annotated networks. Nature Communications, 7(1), 1–11.CrossRef Newman, M., & Clauset, A. (2016). Structure and inference in annotated networks. Nature Communications, 7(1), 1–11.CrossRef
go back to reference Nooy, W. D., Mrvar, A., & Batagelj, V. (2004). Exploratory social network analysis with pajek. Cambridge: Cambridge University Press. Nooy, W. D., Mrvar, A., & Batagelj, V. (2004). Exploratory social network analysis with pajek. Cambridge: Cambridge University Press.
go back to reference Nowicki, K., & Snijders, T. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077–1087.MathSciNetCrossRefMATH Nowicki, K., & Snijders, T. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077–1087.MathSciNetCrossRefMATH
go back to reference Page, L., & S.B.R.M.T.W. (1999). Pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab. Page, L., & S.B.R.M.T.W. (1999). Pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab.
go back to reference Peel, L., Larremore, D., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.CrossRef Peel, L., Larremore, D., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.CrossRef
go back to reference Raghavan, U., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3), 036106.CrossRef Raghavan, U., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3), 036106.CrossRef
go back to reference Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRef Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRef
go back to reference Shi, W., Huang, L., Li, J. H., Wang, C., Tang, Y., & Fu, C. (2019). Network embedding via community based variational autoencoder. IEEE Access, 7, 25323–25333.CrossRef Shi, W., Huang, L., Li, J. H., Wang, C., Tang, Y., & Fu, C. (2019). Network embedding via community based variational autoencoder. IEEE Access, 7, 25323–25333.CrossRef
go back to reference Stanley, N., Bonacci, T., Kwitt, R., Niethammer, M., & Mucha, P. (2019). Stochastic block models with multiple continuous attributes. Applied Network Science, 4(1), 1–22.CrossRef Stanley, N., Bonacci, T., Kwitt, R., Niethammer, M., & Mucha, P. (2019). Stochastic block models with multiple continuous attributes. Applied Network Science, 4(1), 1–22.CrossRef
go back to reference Steinhaeuser, K., & Chawla, N. (2008). Community detection in a large real-world social network. In social computing, behavioral modeling, and prediction (pp. 168–175). Steinhaeuser, K., & Chawla, N. (2008). Community detection in a large real-world social network. In social computing, behavioral modeling, and prediction (pp. 168–175).
go back to reference Wang, D., & Zhao, Y. (2019). Network community detection from the perspective of time series. Physica A: Statistical Mechanics and its Applications, 522, 205–214.CrossRef Wang, D., & Zhao, Y. (2019). Network community detection from the perspective of time series. Physica A: Statistical Mechanics and its Applications, 522, 205–214.CrossRef
go back to reference Xu, Z., Ke, Y., Wang, Y., Cheng, H., & Cheng, J. (2012). A model-based approach to attributed graph clustering. Proceedings of the 2012 ACM SIGMOD international conference on management of data (pp. 505–516). ACM Xu, Z., Ke, Y., Wang, Y., Cheng, H., & Cheng, J. (2012). A model-based approach to attributed graph clustering. Proceedings of the 2012 ACM SIGMOD international conference on management of data (pp. 505–516). ACM
go back to reference Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes. IEEE 13th international conference on data mining (pp. 1151–1156). Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes. IEEE 13th international conference on data mining (pp. 1151–1156).
go back to reference Yin, Z., Gupta, M., Weninger, T., & Han, J. (2010). A unified framework for link recommendation using random walks. 2010 international conference on advances in social networks analysis and mining (pp. 152–159). IEEE Yin, Z., Gupta, M., Weninger, T., & Han, J. (2010). A unified framework for link recommendation using random walks. 2010 international conference on advances in social networks analysis and mining (pp. 152–159). IEEE
go back to reference Zhang, Y., Levina, E., & Zhu, J. (2016). Community detection in networks with node features. Electronic Journal of Statistics, 10(2), 3153–3178.MathSciNetCrossRefMATH Zhang, Y., Levina, E., & Zhu, J. (2016). Community detection in networks with node features. Electronic Journal of Statistics, 10(2), 3153–3178.MathSciNetCrossRefMATH
Metadata
Title
Community Detection in Feature-Rich Networks Using Data Recovery Approach
Authors
Boris Mirkin
Soroosh Shalileh
Publication date
06-07-2022
Publisher
Springer US
Published in
Journal of Classification / Issue 3/2022
Print ISSN: 0176-4268
Electronic ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-022-09416-w

Other articles of this Issue 3/2022

Journal of Classification 3/2022 Go to the issue

Premium Partner