Top

Published in:

2017 | OriginalPaper | Chapter

K-Clique-Graphs for Dense Subgraph Discovery

Authors : Giannis Nikolentzos, Polykarpos Meladianos, Yannis Stavrakas, Michalis Vazirgiannis

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Finding dense subgraphs in a graph is a fundamental graph mining task, with applications in several fields. Algorithms for identifying dense subgraphs are used in biology, in finance, in spam detection, etc. Standard formulations of this problem such as the problem of finding the maximum clique of a graph are hard to solve. However, some tractable formulations of the problem have also been proposed, focusing mainly on optimizing some density function, such as the degree density and the triangle density. However, maximization of degree density usually leads to large subgraphs with small density, while maximization of triangle density does not necessarily lead to subgraphs that are close to being cliques.

In this paper, we introduce the k-clique-graph densest subgraph problem, \(k \ge 3\), a novel formulation for the discovery of dense subgraphs. Given an input graph, its k-clique-graph is a new graph created from the input graph where each vertex of the new graph corresponds to a k-clique of the input graph and two vertices are connected with an edge if they share a common \(k - 1\)-clique. We define a simple density function, the k-clique-graph density, which gives compact and at the same time dense subgraphs, and we project its resulting subgraphs back to the input graph. In this paper, we focus on the triangle-graph densest subgraph problem obtained for \(k = 3\). To optimize the proposed function, we provide an exact algorithm. Furthermore, we present an efficient greedy approximation algorithm that scales well to larger graphs.

We evaluate the proposed algorithms on real datasets and compare them with other algorithms in terms of the size and the density of the extracted subgraphs. The results verify the ability of the proposed algorithms in finding high-quality subgraphs in terms of size and density. Finally, we apply the proposed method to the important problem of keyword extraction from textual documents. Code related to this chapter is available at: https://github.com/giannisnik/k-clique-graphs-dense-subgraphs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Attributed Graph Clustering with Unimodal Normalized Cut

next chapter Learning and Scaling Directed Networks via Graph Embedding

Available only for authorised users

https://networkdata.ics.uci.edu/index.php.

http://snap.stanford.edu/data/index.html.

Code is available at https://github.com/giannisnik/k-clique-graphs-dense-subgraphs.

Alvarez-Hamelin, J.I., Dall’Asta, L., Barrat, A., Vespignani, A.: Large scale networks fingerprinting and visualization using the k-core decomposition. In: NIPS 2005, pp. 41–50 (2005)

Andersen, R., Chellapilla, K.: Finding dense subgraphs with size bounds. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 25–37. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-95995-3_3 CrossRef

Angel, A., Koudas, N., Sarkas, N., Srivastava, D., Svendsen, M., Tirthapura, S.: Dense subgraph maintenance under streaming edge weight updates for real-time story identification. VLDB J. 23(2), 175–199 (2014)CrossRef

Asahiro, Y., Hassin, R., Iwama, K.: Complexity of finding dense subgraphs. Discret. Appl. Math. 121(1), 15–26 (2002)MathSciNetCrossRefMATH

Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily finding a dense subgraph. J. Algorithms 34(2), 203–221 (2000)MathSciNetCrossRefMATH

Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4(1), 1 (2003)CrossRef

Balalau, O.D., Bonchi, F., Chan, T., Gullo, F., Sozio, M.: Finding subgraphs with maximum total density and limited overlap. In: WSDM 2015, pp. 379–388 (2015)

Björklund, A., Pagh, R., Williams, V.V., Zwick, U.: Listing triangles. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP 2014. LNCS, vol. 8572, pp. 223–234. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43948-7_19

Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)CrossRefMATH

10.

Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM 2008, pp. 95–106 (2008)

11.

Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44436-X_10 CrossRef

12.

Chen, J., Saad, Y.: Dense subgraph extraction with application to community detection. TKDE 24(7), 1216–1230 (2012)

13.

Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. In: SICOMP 1985, vol. 14, no. 1, pp. 210–223 (1985)

14.

Du, X., Jin, R., Ding, L., Lee, V.E., Thornton Jr., J.H.: Migration motif: a spatial-temporal pattern mining approach for financial markets. In: KDD 2009, pp. 1135–1144 (2009)

15.

Feige, U.: Approximating maximum clique by removing subgraphs. In: SIDMA 2004, vol. 18, no. 2, pp. 219–225 (2004)

16.

Feige, U., Peleg, D., Kortsarz, G.: The dense \(k\)-subgraph problem. Algorithmica 29(3), 410–421 (2001)MathSciNetCrossRefMATH

17.

Fratkin, E., Naughton, B.T., Brutlag, D.L., Batzoglou, S.: MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 22(14), e150–e157 (2006)CrossRef

18.

Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: VLDB 2005, pp. 721–732 (2005)

19.

Goldberg, A.V.: Finding a maximum density subgraph. Technical report, University of California Berkeley (1984)

20.

Håstad, J.: Clique is hard to approximate within \(n^{1-\epsilon }\). In: FOCS 1996, pp. 627–636 (1996)

21.

Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. In: SICOMP 1978, vol. 7, no. 4, pp. 413–423 (1978)

22.

Karp, R.M.: Reducibility Among Combinatorial Problems. Springer, Boston (1972). https://doi.org/10.1007/978-1-4684-2001-2_9 CrossRefMATH

23.

Khuller, S., Saha, B.: On finding dense subgraphs. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 597–608. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02927-1_50 CrossRef

24.

Lee, V.E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010)

25.

Meladianos, P., Nikolentzos, G., Rousseau, F., Stavrakas, Y., Vazirgiannis, M.: Degeneracy-based real-time sub-event detection in Twitter stream. In: ICWSM 2015, pp. 248–257 (2015)

26.

Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: EMNLP 2004, pp. 404–411 (2004)

27.

Nikolentzos, G., Meladianos, P., Stavrakas, Y., Vazirgiannis, M.: Supplementary material for k-clique-graphs for dense subgraph discovery (2017). http://www.db-net.aueb.gr/nikolentzos/files/ecml_pkdd17_suppl.pdf

28.

Orlin, J.B.: A faster strongly polynomial time algorithm for submodular function minimization. Math. Program. 118(2), 237–251 (2009)MathSciNetCrossRefMATH

29.

Rousseau, F., Vazirgiannis, M.: Main core retention on graph-of-words for single-document keyword extraction. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 382–393. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_42

30.

Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Nikoletseas, S.E. (ed.) WEA 2005. LNCS, vol. 3503, pp. 606–609. Springer, Heidelberg (2005). https://doi.org/10.1007/11427186_54 CrossRef

31.

Schrijver, A.: A combinatorial algorithm minimizing submodular functions in strongly polynomial time. JCT 80(2), 346–355 (2000)MathSciNetCrossRefMATH

32.

Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: KDD 2010, pp. 939–948 (2010)

33.

Tixier, A.J.P., Malliaros, F.D., Vazirgiannis, M.: A graph degeneracy-based approach to keyword extraction. In: EMNLP 2016 (2016)

34.

Tsourakakis, C.: The k-clique densest subgraph problem. In: WWW 2015, pp. 1122–1132 (2015)

35.

Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD 2013, pp. 104–112 (2013)

36.

Wang, N., Zhang, J., Tan, K.L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. VLDB Endow. 4(2), 58–68 (2010)CrossRef

Title: K-Clique-Graphs for Dense Subgraph Discovery
Authors: Giannis Nikolentzos
Polykarpos Meladianos
Yannis Stavrakas
Michalis Vazirgiannis
Publisher: Springer International Publishing
Book: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-319-71248-2

Electronic ISBN: 978-3-319-71249-9

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-71249-9_37

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner