Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

Using Synthetic Networks for Parameter Tuning in Community Detection

verfasst von : Liudmila Prokhorenkova

Erschienen in: Algorithms and Models for the Web Graph

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Community detection is one of the most important and challenging problems in network analysis. However, real-world networks may have very different structural properties and communities of various nature. As a result, it is hard (or even impossible) to develop one algorithm suitable for all datasets. A standard machine learning tool is to consider a parametric algorithm and choose its parameters based on the dataset at hand. However, this approach is not applicable to community detection since usually no labeled data is available for such parameter tuning. In this paper, we propose a simple and effective procedure allowing to tune hyperparameters of any given community detection algorithm without requiring any labeled data. The core idea is to generate a synthetic network with properties similar to a given real-world one, but with known communities. It turns out that tuning parameters on such synthetic graph also improves the quality for a given real-world network. To illustrate the effectiveness of the proposed algorithm, we show significant improvements obtained for several well-known parametric community detection algorithms on a variety of synthetic and real-world datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
Note that \(\hat{\mu }>0.5\) does not mean the absence of community structure since usually a community is much smaller than the rest of the network and even if more than a half of the edges for each vertex go outside the community, the density of edges inside the community is still large.
 
4
The results in Tables 2, 3 and 4 are rounded to two decimals, so there may be a statistically significant improvement even when the numbers in the table are equal. Also, standard deviation less than 0.005 is rounded to zero.
 
5
For small datasets, our results for the default Louvain algorithm may differ from the ones reported in [24]. The reason is the high values of standard deviation. The authors of [24] averaged the results over 5 runs of the algorithm, while we use more runs, i.e., our average values are more stable.
 
Literatur
1.
Zurück zum Zitat Adamic, L.A., Glance, N.: The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. ACM (2005) Adamic, L.A., Glance, N.: The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. ACM (2005)
2.
Zurück zum Zitat Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)MathSciNetMATH Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)MathSciNetMATH
3.
Zurück zum Zitat Bickel, P.J., Chen, A.: A nonparametric view of network models and newman-girvan and other modularities. Proc. Natl. Acad. Sci. 106(50), 21068–21073 (2009)CrossRef Bickel, P.J., Chen, A.: A nonparametric view of network models and newman-girvan and other modularities. Proc. Natl. Acad. Sci. 106(50), 21068–21073 (2009)CrossRef
4.
Zurück zum Zitat Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008)CrossRef Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008)CrossRef
5.
Zurück zum Zitat Boguná, M., Papadopoulos, F., Krioukov, D.: Sustaining the internet with hyperbolic mapping. Nat. Commun. 1, 62 (2010)CrossRef Boguná, M., Papadopoulos, F., Krioukov, D.: Sustaining the internet with hyperbolic mapping. Nat. Commun. 1, 62 (2010)CrossRef
6.
Zurück zum Zitat Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. Stat. Anal. Data Min.: ASA Data Sci. J. 4(5), 512–546 (2011)MathSciNetCrossRef Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. Stat. Anal. Data Min.: ASA Data Sci. J. 4(5), 512–546 (2011)MathSciNetCrossRef
8.
Zurück zum Zitat Fortunato, S., Barthélemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. 104(1), 36–41 (2007)CrossRef Fortunato, S., Barthélemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. 104(1), 36–41 (2007)CrossRef
9.
10.
Zurück zum Zitat Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., Sculley, D.: Google vizier: a service for black-box optimization. In: International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495. ACM (2017) Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., Sculley, D.: Google vizier: a service for black-box optimization. In: International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495. ACM (2017)
12.
Zurück zum Zitat Karrer, B., Newman, M.E.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83(1), 016107 (2011)MathSciNetCrossRef Karrer, B., Newman, M.E.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83(1), 016107 (2011)MathSciNetCrossRef
13.
Zurück zum Zitat Lancichinetti, A., Fortunato, S.: Limits of modularity maximization in community detection. Phys. Rev. E 84(6), 066122 (2011)CrossRef Lancichinetti, A., Fortunato, S.: Limits of modularity maximization in community detection. Phys. Rev. E 84(6), 066122 (2011)CrossRef
14.
Zurück zum Zitat Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef
15.
Zurück zum Zitat Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 2 (2007)CrossRef Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 2 (2007)CrossRef
16.
Zurück zum Zitat Lusseau, D., Schneider, K., Boisseau, O.J., Haase, P., Slooten, E., Dawson, S.M.: The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54(4), 396–405 (2003)CrossRef Lusseau, D., Schneider, K., Boisseau, O.J., Haase, P., Slooten, E., Dawson, S.M.: The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54(4), 396–405 (2003)CrossRef
17.
Zurück zum Zitat Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)MathSciNetCrossRef Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)MathSciNetCrossRef
18.
Zurück zum Zitat Miasnikof, P., Prokhorenkova, L., Shestopaloff, A.Y., Raigorodskii, A.: A statistical test of heterogeneous subgraph densities to assess clusterability. In: 13th LION Learning and Intelligent OptimizatioN Conference. Springer (2019) Miasnikof, P., Prokhorenkova, L., Shestopaloff, A.Y., Raigorodskii, A.: A statistical test of heterogeneous subgraph densities to assess clusterability. In: 13th LION Learning and Intelligent OptimizatioN Conference. Springer (2019)
19.
Zurück zum Zitat Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)MathSciNetCrossRef Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)MathSciNetCrossRef
20.
Zurück zum Zitat Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef
21.
Zurück zum Zitat Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef
22.
Zurück zum Zitat Newman, M.: Community detection in networks: modularity optimization and maximum likelihood are equivalent. arXiv preprint arXiv:1606.02319 (2016) Newman, M.: Community detection in networks: modularity optimization and maximum likelihood are equivalent. arXiv preprint arXiv:​1606.​02319 (2016)
23.
Zurück zum Zitat Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)CrossRef Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)CrossRef
24.
Zurück zum Zitat Prokhorenkova, L., Tikhonov, A.: Community detection through likelihood optimization: in search of a sound model. In: The World Wide Web Conference, pp. 1498–1508. ACM (2019) Prokhorenkova, L., Tikhonov, A.: Community detection through likelihood optimization: in search of a sound model. In: The World Wide Web Conference, pp. 1498–1508. ACM (2019)
25.
Zurück zum Zitat Snoek, J., et al.: Scalable Bayesian optimization using deep neural networks. In: International Conference on Machine Learning, pp. 2171–2180 (2015) Snoek, J., et al.: Scalable Bayesian optimization using deep neural networks. In: International Conference on Machine Learning, pp. 2171–2180 (2015)
26.
Zurück zum Zitat Šubelj, L., Bajec, M.: Model of complex networks based on citation dynamics. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 527–530. ACM (2013) Šubelj, L., Bajec, M.: Model of complex networks based on citation dynamics. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 527–530. ACM (2013)
27.
Zurück zum Zitat Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)CrossRef Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)CrossRef
Metadaten
Titel
Using Synthetic Networks for Parameter Tuning in Community Detection
verfasst von
Liudmila Prokhorenkova
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-25070-6_1