Skip to main content
Erschienen in: The VLDB Journal 3/2019

08.05.2019 | Regular Paper

Efficient structural graph clustering: an index-based approach

verfasst von: Dong Wen, Lu Qin, Ying Zhang, Lijun Chang, Xuemin Lin

Erschienen in: The VLDB Journal | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graph clustering is a fundamental problem widely applied in many applications. The structural graph clustering (\(\mathsf {SCAN}\)) method obtains not only clusters but also hubs and outliers. However, the clustering results heavily depend on two parameters, \(\epsilon \) and \(\mu \), while the optimal parameter setting depends on different graph properties and various user requirements. In addition, all existing \(\mathsf {SCAN}\) solutions need to scan at least the whole graph, even if only a small number of vertices belong to clusters. In this paper, we propose an index-based method for \(\mathsf {SCAN}\). Based on our index, we cluster the graph for any \(\epsilon \) and \(\mu \) in \(O(\sum _{C\in \mathbb {C}}|E_C|)\) time, where \(\mathbb {C} \) is the result set of all clusters and \(|E_C|\) is the number of edges in a specific cluster \(C\). In other words, the time spent on computing structural clustering depends only on the result size, not on the size of the original graph. Our index’s space complexity is O(m), where m is the number of edges in the graph. To handle dynamic graph updates, we propose algorithms and several optimization techniques for maintaining our index. We also design an index for I/O efficient query processing. We conduct extensive experiments to evaluate the performance of all our proposed algorithms on 10 real-world networks, with the largest one containing more than 1 billion edges. The experimental results demonstrate that our approaches significantly outperform existing solutions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bortner, D., Han, J.: Progressive clustering of networks using structure-connected order of traversal. In: Proceedings of ICDE’10, pp. 653–656 (2010) Bortner, D., Han, J.: Progressive clustering of networks using structure-connected order of traversal. In: Proceedings of ICDE’10, pp. 653–656 (2010)
2.
Zurück zum Zitat Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016) Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016)
3.
Zurück zum Zitat Cheng, J., Ke, Y., Chu, S., T. Özsu, M.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011) Cheng, J., Ke, Y., Chu, S., T. Özsu, M.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011)
4.
Zurück zum Zitat Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD, pp. 447–458 (2010) Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD, pp. 447–458 (2010)
6.
Zurück zum Zitat Ding, C.H., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: ICDM, pp. 107–114 (2001) Ding, C.H., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: ICDM, pp. 107–114 (2001)
8.
Zurück zum Zitat Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)MathSciNetCrossRefMATH Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)MathSciNetCrossRefMATH
9.
Zurück zum Zitat Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433(7028), 895 (2005)CrossRef Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433(7028), 895 (2005)CrossRef
10.
Zurück zum Zitat Huang, J., Sun, H., Han, J., Deng, H., Sun, Y., Liu, Y.: Shrink: a structural clustering algorithm for detecting hierarchical communities in networks. In: CIKM, pp. 219–228 (2010) Huang, J., Sun, H., Han, J., Deng, H., Sun, Y., Liu, Y.: Shrink: a structural clustering algorithm for detecting hierarchical communities in networks. In: CIKM, pp. 219–228 (2010)
11.
Zurück zum Zitat Jiang, P., Singh, M.: Spici: a fast clustering algorithm for large biological networks. Bioinformatics 26(8), 1105–1111 (2010)CrossRef Jiang, P., Singh, M.: Spici: a fast clustering algorithm for large biological networks. Bioinformatics 26(8), 1105–1111 (2010)CrossRef
12.
Zurück zum Zitat Kang, U., Faloutsos, C.: Beyond ‘Caveman Communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011) Kang, U., Faloutsos, C.: Beyond ‘Caveman Communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011)
13.
Zurück zum Zitat Lee, V. E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010) Lee, V. E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010)
14.
Zurück zum Zitat Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014) Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)
15.
Zurück zum Zitat Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore cpus. In: ICDE, pp. 349–360 (2017) Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore cpus. In: ICDE, pp. 349–360 (2017)
16.
Zurück zum Zitat Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef
17.
18.
Zurück zum Zitat Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)CrossRef Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)CrossRef
19.
Zurück zum Zitat Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013) Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013)
20.
Zurück zum Zitat Shiokawa, H., Fujiwara, Y., Onizuka, M.: Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. PVLDB 8(11), 1178–1189 (2015) Shiokawa, H., Fujiwara, Y., Onizuka, M.: Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. PVLDB 8(11), 1178–1189 (2015)
21.
Zurück zum Zitat Shiokawa, H., Takahashi, T., Kitagawa, H.: Scalescan: scalable density-based graph clustering. In: Database and Expert Systems Applications, pp. 18–34 (2018) Shiokawa, H., Takahashi, T., Kitagawa, H.: Scalescan: scalable density-based graph clustering. In: Database and Expert Systems Applications, pp. 18–34 (2018)
22.
Zurück zum Zitat Son, M. T., Amer-Yahia, S., Assent, I., Birk, M., Storgaard Dieu, M. Jacobsen, J., Kristensen, J.: Scalable interactive dynamic graph clustering on multicore CPUs. In: TKDE (2018) Son, M. T., Amer-Yahia, S., Assent, I., Birk, M., Storgaard Dieu, M. Jacobsen, J., Kristensen, J.: Scalable interactive dynamic graph clustering on multicore CPUs. In: TKDE (2018)
23.
Zurück zum Zitat Sun, H., Huang, J., Han, J. Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: ICDM, pp. 481–490 (2010) Sun, H., Huang, J., Han, J. Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: ICDM, pp. 481–490 (2010)
24.
Zurück zum Zitat Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, NDA, pp. 6:1–6:7 (2017) Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, NDA, pp. 6:1–6:7 (2017)
25.
Zurück zum Zitat Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pp. 104–112. ACM (2013) Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pp. 104–112. ACM (2013)
26.
Zurück zum Zitat Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012) Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
27.
Zurück zum Zitat Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014) Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014)
28.
Zurück zum Zitat Wang, N., Zhang, J., Tan, K.-L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010) Wang, N., Zhang, J., Tan, K.-L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010)
29.
Zurück zum Zitat Wen, D., Qin, L., Zhang, Y., Chang, L., Lin, X.: Efficient structural graph clustering: an index-based approach. PVLDB 11(3), 243–255 (2017) Wen, D., Qin, L., Zhang, Y., Chang, L., Lin, X.: Efficient structural graph clustering: an index-based approach. PVLDB 11(3), 243–255 (2017)
30.
Zurück zum Zitat Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/o efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016) Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/o efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)
31.
Zurück zum Zitat Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007) Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)
32.
Zurück zum Zitat Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/o efficient depth-first search. In: SIGMOD, pp. 445–458 (2015) Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/o efficient depth-first search. In: SIGMOD, pp. 445–458 (2015)
33.
Zurück zum Zitat Zhao, W., Chen, G., Xu, X.: AnySCAN: an efficient anytime framework with active learning for large-scale network clustering. In: ICDM, pp. 665–674 (2017) Zhao, W., Chen, G., Xu, X.: AnySCAN: an efficient anytime framework with active learning for large-scale network clustering. In: ICDM, pp. 665–674 (2017)
34.
Zurück zum Zitat Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in MapReduce. In: AINA, pp. 862–869 (2013) Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in MapReduce. In: AINA, pp. 862–869 (2013)
35.
Zurück zum Zitat Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009) Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
36.
Zurück zum Zitat Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17, 73–83 (1996) Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17, 73–83 (1996)
Metadaten
Titel
Efficient structural graph clustering: an index-based approach
verfasst von
Dong Wen
Lu Qin
Ying Zhang
Lijun Chang
Xuemin Lin
Publikationsdatum
08.05.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 3/2019
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-019-00541-4

Weitere Artikel der Ausgabe 3/2019

The VLDB Journal 3/2019 Zur Ausgabe