Skip to main content
Erschienen in: The VLDB Journal 2/2016

01.04.2016 | Regular Paper

Diversified top-k clique search

verfasst von: Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, Wenjie Zhang

Erschienen in: The VLDB Journal | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, A., Vitter, J., et al.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRef Aggarwal, A., Vitter, J., et al.: The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRef
2.
Zurück zum Zitat Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of WSDM’09, pp. 5–14 (2009) Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of WSDM’09, pp. 5–14 (2009)
4.
Zurück zum Zitat Angel, A., Koudas, N.: Efficient diversity-aware search. In: Proceedings of SIGMOD’11, pp. 781–792 (2011) Angel, A., Koudas, N.: Efficient diversity-aware search. In: Proceedings of SIGMOD’11, pp. 781–792 (2011)
5.
Zurück zum Zitat Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.T.: Online maximum k-coverage. Discrete Appl. Math. 160(13–14), 1901–1913 (2012)MathSciNetCrossRefMATH Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.T.: Online maximum k-coverage. Discrete Appl. Math. 160(13–14), 1901–1913 (2012)MathSciNetCrossRefMATH
6.
Zurück zum Zitat Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of KDD’14, pp. 671–680 (2014) Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of KDD’14, pp. 671–680 (2014)
7.
Zurück zum Zitat Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. CoRR. cs.DS/0310049 (2003) Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. CoRR. cs.DS/0310049 (2003)
8.
Zurück zum Zitat Bernard, H.R., Killworth, P.D., Sailer, L.: Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data. Soc. Netw. 2(3), 191–218 (1979)CrossRef Bernard, H.R., Killworth, P.D., Sailer, L.: Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data. Soc. Netw. 2(3), 191–218 (1979)CrossRef
9.
Zurück zum Zitat Berry, N., Ko, T., Moy, T., Smrcka, J., Turnley, J., Wu, B.: Emergent clique formation in terrorist recruitment. In: Workshop on Agent Organizations: Theory and Practice (2004) Berry, N., Ko, T., Moy, T., Smrcka, J., Turnley, J., Wu, B.: Emergent clique formation in terrorist recruitment. In: Workshop on Agent Organizations: Theory and Practice (2004)
10.
Zurück zum Zitat Borodin, A., Lee, H.C., Ye, Y.: Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of PODS’12, pp. 155–166 (2012) Borodin, A., Lee, H.C., Ye, Y.: Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of PODS’12, pp. 155–166 (2012)
11.
Zurück zum Zitat Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)CrossRefMATH Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)CrossRefMATH
12.
Zurück zum Zitat Carraghan, R., Pardalos, P.M.: An exact algorithm for the maximum clique problem. Operat. Res. Lett. 9(6), 375–382 (1990)CrossRefMATH Carraghan, R., Pardalos, P.M.: An exact algorithm for the maximum clique problem. Operat. Res. Lett. 9(6), 375–382 (1990)CrossRefMATH
13.
14.
Zurück zum Zitat Cheng, J., Ke, Y., Chu, S., Özsu, M.T.: Efficient core decomposition in massive networks. In: Proceedings of ICDE, pp. 51–62 (2011) Cheng, J., Ke, Y., Chu, S., Özsu, M.T.: Efficient core decomposition in massive networks. In: Proceedings of ICDE, pp. 51–62 (2011)
15.
Zurück zum Zitat Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36(4), 21:1–21:34 (2011)CrossRef Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36(4), 21:1–21:34 (2011)CrossRef
16.
Zurück zum Zitat Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of KDD’12, pp. 1240–1248 (2012) Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of KDD’12, pp. 1240–1248 (2012)
17.
Zurück zum Zitat Chierichetti, F., Kumar, R., Tomkins, A.: Max-cover in map-reduce. In: Proceedings of WWW’10, pp. 231–240 (2010) Chierichetti, F., Kumar, R., Tomkins, A.: Max-cover in map-reduce. In: Proceedings of WWW’10, pp. 231–240 (2010)
18.
Zurück zum Zitat Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680 (2011) Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680 (2011)
19.
Zurück zum Zitat Demidova, E., Fankhauser, P., Zhou, X., Nejdl, W.: DivQ: diversification for keyword search over structured databases. In: Proceedings of SIGIR’10, pp. 331–338 (2010) Demidova, E., Fankhauser, P., Zhou, X., Nejdl, W.: DivQ: diversification for keyword search over structured databases. In: Proceedings of SIGIR’10, pp. 331–338 (2010)
20.
21.
Zurück zum Zitat Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)CrossRef Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)CrossRef
22.
Zurück zum Zitat Eppstein, D., Loffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. ISAAC 1, 403–414 (2010)MathSciNetMATH Eppstein, D., Loffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. ISAAC 1, 403–414 (2010)MathSciNetMATH
23.
Zurück zum Zitat Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: Proceedings of SEA’11, pp. 364–375 (2011) Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: Proceedings of SEA’11, pp. 364–375 (2011)
24.
Zurück zum Zitat Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013) Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)
26.
Zurück zum Zitat Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco (1979)MATH Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco (1979)MATH
27.
Zurück zum Zitat Hu, X., Tao, Y., Chung, C.: I/O-efficient algorithms on triangle listing and counting. ACM Trans. Database Syst. 39(4), 27:1–27:30 (2014)MathSciNetCrossRef Hu, X., Tao, Y., Chung, C.: I/O-efficient algorithms on triangle listing and counting. ACM Trans. Database Syst. 39(4), 27:1–27:30 (2014)MathSciNetCrossRef
28.
Zurück zum Zitat Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations. Plenum Press (1972) Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations. Plenum Press (1972)
29.
Zurück zum Zitat Konc, J., Janezic, D.: An improved branch and bound algorithm for the maximum clique problem. Proteins 4, 5 (2007)MathSciNetMATH Konc, J., Janezic, D.: An improved branch and bound algorithm for the maximum clique problem. Proteins 4, 5 (2007)MathSciNetMATH
30.
Zurück zum Zitat Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. In: Workshop on Social Network Mining and Analysis (2010) Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. In: Workshop on Social Network Mining and Analysis (2010)
31.
Zurück zum Zitat Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: Proceedings of ICDE, pp. 86–95 (2007) Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: Proceedings of ICDE, pp. 86–95 (2007)
32.
Zurück zum Zitat Minack, E., Siberski, W., Nejdl, W.: Incremental diversification for very large sets: a streaming-based approach. In: Proceedings of SIGIR’11, pp. 585–594 (2011) Minack, E., Siberski, W., Nejdl, W.: Incremental diversification for very large sets: a streaming-based approach. In: Proceedings of SIGIR’11, pp. 585–594 (2011)
33.
Zurück zum Zitat Östergård, P.R.: A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120(1), 197–207 (2002)MathSciNetCrossRef Östergård, P.R.: A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120(1), 197–207 (2002)MathSciNetCrossRef
34.
Zurück zum Zitat Qin, L., Yu, J.X., Chang, L.: Diversifying top-k results. PVLDB 5(11), 1124–1135 (2012) Qin, L., Yu, J.X., Chang, L.: Diversifying top-k results. PVLDB 5(11), 1124–1135 (2012)
35.
Zurück zum Zitat Robson, J.: Finding a maximum independent set in time \(O(2^{n/4})\). In: Technical report, 1251-01, LaBRI, Université de Bordeaux I (2001) Robson, J.: Finding a maximum independent set in time \(O(2^{n/4})\). In: Technical report, 1251-01, LaBRI, Université de Bordeaux I (2001)
36.
Zurück zum Zitat Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: Proceedings of SDM’09, pp. 697–708 (2009) Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: Proceedings of SDM’09, pp. 697–708 (2009)
37.
Zurück zum Zitat Schmidt, M.C., Samatova, N.F., Thomas, K., Park, B.-H.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69(4), 417–428 (2009)CrossRef Schmidt, M.C., Samatova, N.F., Thomas, K., Park, B.-H.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69(4), 417–428 (2009)CrossRef
38.
Zurück zum Zitat Suyudi, M., Mohd, I.B., Mamat, M., Sopiyan, S., Supriatna, A.K.: Solution of maximum clique problem by using branch and bound method. Appl. Math. Sci. 8(2), 81–90 (2014)MathSciNet Suyudi, M., Mohd, I.B., Mamat, M., Sopiyan, S., Supriatna, A.K.: Solution of maximum clique problem by using branch and bound method. Appl. Math. Sci. 8(2), 81–90 (2014)MathSciNet
39.
Zurück zum Zitat Tomita, E., Kameda, T.: An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments. J. Global Optim. 37(1), 95–111 (2007)MathSciNetCrossRefMATH Tomita, E., Kameda, T.: An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments. J. Global Optim. 37(1), 95–111 (2007)MathSciNetCrossRefMATH
40.
Zurück zum Zitat Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)MathSciNetCrossRefMATH Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)MathSciNetCrossRefMATH
41.
Zurück zum Zitat Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, Jr., C., Tsotras, V.J.: On query result diversification. In: Proceedings of ICDE’11 (2011) Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, Jr., C., Tsotras, V.J.: On query result diversification. In: Proceedings of ICDE’11 (2011)
42.
Zurück zum Zitat Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012) Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
43.
Zurück zum Zitat Wang, J., Cheng, J., Fu, A.W.-C.: Redundancy-aware maximal cliques. In: Proceedings of KDD’13, pp. 122–130 (2013) Wang, J., Cheng, J., Fu, A.W.-C.: Redundancy-aware maximal cliques. In: Proceedings of KDD’13, pp. 122–130 (2013)
44.
Zurück zum Zitat Welsh, D.J.A., Powell, M.B.: An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10(1), 85–86 (1967)CrossRefMATH Welsh, D.J.A., Powell, M.B.: An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10(1), 85–86 (1967)CrossRefMATH
45.
Zurück zum Zitat Xiang, J., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using mapreduce. In Proceedings of ICDE’13, pp. 74–85 (2013) Xiang, J., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using mapreduce. In Proceedings of ICDE’13, pp. 74–85 (2013)
46.
Zurück zum Zitat Xu, Y., Cheng, J., Fu, A.W.-C., Bu, Y.: Distributed maximal clique computation. In: Proceedings of BigData’14, pp. 160–167 (2014) Xu, Y., Cheng, J., Fu, A.W.-C., Bu, Y.: Distributed maximal clique computation. In: Proceedings of BigData’14, pp. 160–167 (2014)
47.
Zurück zum Zitat Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: Proceedings of SDM’13, pp. 758–766 (2013) Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: Proceedings of SDM’13, pp. 758–766 (2013)
48.
Zurück zum Zitat Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. In: Proceedings of ICDE’15, pp. 387–398 (2015) Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. In: Proceedings of ICDE’15, pp. 387–398 (2015)
49.
Zurück zum Zitat Zhang, Z., Qin, L., Yu, J.X.: Contract & expand: I/O efficient sccs computing. In: Proceedings of ICDE, pp. 208–219 (2014) Zhang, Z., Qin, L., Yu, J.X.: Contract & expand: I/O efficient sccs computing. In: Proceedings of ICDE, pp. 208–219 (2014)
50.
Zurück zum Zitat Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/O efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD, pp. 181–192 (2013) Zhang, Z., Yu, J.X., Qin, L., Chang, L., Lin, X.: I/O efficient: computing sccs in massive graphs. In: Proceedings of SIGMOD, pp. 181–192 (2013)
51.
Zurück zum Zitat Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/O efficient depth-first search. In: Proceedings of SIGMOD, pp. 445–458 (2015) Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/O efficient depth-first search. In: Proceedings of SIGMOD, pp. 445–458 (2015)
52.
Zurück zum Zitat Zheng, X., Liu, T., Yang, Z., Wang, J.: Large cliques in Arabidopsis gene coexpression network and motif discovery. J. Plant Physiol. 168(6), 611–618 (2011)CrossRef Zheng, X., Liu, T., Yang, Z., Wang, J.: Large cliques in Arabidopsis gene coexpression network and motif discovery. J. Plant Physiol. 168(6), 611–618 (2011)CrossRef
Metadaten
Titel
Diversified top-k clique search
verfasst von
Long Yuan
Lu Qin
Xuemin Lin
Lijun Chang
Wenjie Zhang
Publikationsdatum
01.04.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 2/2016
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-015-0408-z

Weitere Artikel der Ausgabe 2/2016

The VLDB Journal 2/2016 Zur Ausgabe

Premium Partner