Skip to main content
Top
Published in: The VLDB Journal 3/2019

08-05-2019 | Regular Paper

Efficient structural graph clustering: an index-based approach

Authors: Dong Wen, Lu Qin, Ying Zhang, Lijun Chang, Xuemin Lin

Published in: The VLDB Journal | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Graph clustering is a fundamental problem widely applied in many applications. The structural graph clustering (\(\mathsf {SCAN}\)) method obtains not only clusters but also hubs and outliers. However, the clustering results heavily depend on two parameters, \(\epsilon \) and \(\mu \), while the optimal parameter setting depends on different graph properties and various user requirements. In addition, all existing \(\mathsf {SCAN}\) solutions need to scan at least the whole graph, even if only a small number of vertices belong to clusters. In this paper, we propose an index-based method for \(\mathsf {SCAN}\). Based on our index, we cluster the graph for any \(\epsilon \) and \(\mu \) in \(O(\sum _{C\in \mathbb {C}}|E_C|)\) time, where \(\mathbb {C} \) is the result set of all clusters and \(|E_C|\) is the number of edges in a specific cluster \(C\). In other words, the time spent on computing structural clustering depends only on the result size, not on the size of the original graph. Our index’s space complexity is O(m), where m is the number of edges in the graph. To handle dynamic graph updates, we propose algorithms and several optimization techniques for maintaining our index. We also design an index for I/O efficient query processing. We conduct extensive experiments to evaluate the performance of all our proposed algorithms on 10 real-world networks, with the largest one containing more than 1 billion edges. The experimental results demonstrate that our approaches significantly outperform existing solutions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bortner, D., Han, J.: Progressive clustering of networks using structure-connected order of traversal. In: Proceedings of ICDE’10, pp. 653–656 (2010) Bortner, D., Han, J.: Progressive clustering of networks using structure-connected order of traversal. In: Proceedings of ICDE’10, pp. 653–656 (2010)
2.
go back to reference Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016) Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016)
3.
go back to reference Cheng, J., Ke, Y., Chu, S., T. Özsu, M.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011) Cheng, J., Ke, Y., Chu, S., T. Özsu, M.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011)
4.
go back to reference Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD, pp. 447–458 (2010) Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD, pp. 447–458 (2010)
6.
go back to reference Ding, C.H., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: ICDM, pp. 107–114 (2001) Ding, C.H., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: ICDM, pp. 107–114 (2001)
8.
go back to reference Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)MathSciNetCrossRefMATH Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)MathSciNetCrossRefMATH
9.
go back to reference Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433(7028), 895 (2005)CrossRef Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433(7028), 895 (2005)CrossRef
10.
go back to reference Huang, J., Sun, H., Han, J., Deng, H., Sun, Y., Liu, Y.: Shrink: a structural clustering algorithm for detecting hierarchical communities in networks. In: CIKM, pp. 219–228 (2010) Huang, J., Sun, H., Han, J., Deng, H., Sun, Y., Liu, Y.: Shrink: a structural clustering algorithm for detecting hierarchical communities in networks. In: CIKM, pp. 219–228 (2010)
11.
go back to reference Jiang, P., Singh, M.: Spici: a fast clustering algorithm for large biological networks. Bioinformatics 26(8), 1105–1111 (2010)CrossRef Jiang, P., Singh, M.: Spici: a fast clustering algorithm for large biological networks. Bioinformatics 26(8), 1105–1111 (2010)CrossRef
12.
go back to reference Kang, U., Faloutsos, C.: Beyond ‘Caveman Communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011) Kang, U., Faloutsos, C.: Beyond ‘Caveman Communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011)
13.
go back to reference Lee, V. E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010) Lee, V. E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010)
14.
go back to reference Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014) Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)
15.
go back to reference Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore cpus. In: ICDE, pp. 349–360 (2017) Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore cpus. In: ICDE, pp. 349–360 (2017)
16.
go back to reference Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef
18.
go back to reference Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)CrossRef Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)CrossRef
19.
go back to reference Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013) Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013)
20.
go back to reference Shiokawa, H., Fujiwara, Y., Onizuka, M.: Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. PVLDB 8(11), 1178–1189 (2015) Shiokawa, H., Fujiwara, Y., Onizuka, M.: Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. PVLDB 8(11), 1178–1189 (2015)
21.
go back to reference Shiokawa, H., Takahashi, T., Kitagawa, H.: Scalescan: scalable density-based graph clustering. In: Database and Expert Systems Applications, pp. 18–34 (2018) Shiokawa, H., Takahashi, T., Kitagawa, H.: Scalescan: scalable density-based graph clustering. In: Database and Expert Systems Applications, pp. 18–34 (2018)
22.
go back to reference Son, M. T., Amer-Yahia, S., Assent, I., Birk, M., Storgaard Dieu, M. Jacobsen, J., Kristensen, J.: Scalable interactive dynamic graph clustering on multicore CPUs. In: TKDE (2018) Son, M. T., Amer-Yahia, S., Assent, I., Birk, M., Storgaard Dieu, M. Jacobsen, J., Kristensen, J.: Scalable interactive dynamic graph clustering on multicore CPUs. In: TKDE (2018)
23.
go back to reference Sun, H., Huang, J., Han, J. Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: ICDM, pp. 481–490 (2010) Sun, H., Huang, J., Han, J. Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: ICDM, pp. 481–490 (2010)
24.
go back to reference Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, NDA, pp. 6:1–6:7 (2017) Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, NDA, pp. 6:1–6:7 (2017)
25.
go back to reference Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pp. 104–112. ACM (2013) Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pp. 104–112. ACM (2013)
26.
go back to reference Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012) Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
27.
go back to reference Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014) Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014)
28.
go back to reference Wang, N., Zhang, J., Tan, K.-L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010) Wang, N., Zhang, J., Tan, K.-L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010)
29.
go back to reference Wen, D., Qin, L., Zhang, Y., Chang, L., Lin, X.: Efficient structural graph clustering: an index-based approach. PVLDB 11(3), 243–255 (2017) Wen, D., Qin, L., Zhang, Y., Chang, L., Lin, X.: Efficient structural graph clustering: an index-based approach. PVLDB 11(3), 243–255 (2017)
30.
go back to reference Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/o efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016) Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/o efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)
31.
go back to reference Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007) Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)
32.
go back to reference Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/o efficient depth-first search. In: SIGMOD, pp. 445–458 (2015) Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/o efficient depth-first search. In: SIGMOD, pp. 445–458 (2015)
33.
go back to reference Zhao, W., Chen, G., Xu, X.: AnySCAN: an efficient anytime framework with active learning for large-scale network clustering. In: ICDM, pp. 665–674 (2017) Zhao, W., Chen, G., Xu, X.: AnySCAN: an efficient anytime framework with active learning for large-scale network clustering. In: ICDM, pp. 665–674 (2017)
34.
go back to reference Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in MapReduce. In: AINA, pp. 862–869 (2013) Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in MapReduce. In: AINA, pp. 862–869 (2013)
35.
go back to reference Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009) Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
36.
go back to reference Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17, 73–83 (1996) Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17, 73–83 (1996)
Metadata
Title
Efficient structural graph clustering: an index-based approach
Authors
Dong Wen
Lu Qin
Ying Zhang
Lijun Chang
Xuemin Lin
Publication date
08-05-2019
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 3/2019
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-019-00541-4

Other articles of this Issue 3/2019

The VLDB Journal 3/2019 Go to the issue

Premium Partner