Skip to main content

2018 | OriginalPaper | Buchkapitel

Scalable Approximation Algorithm for Graph Summarization

verfasst von : Maham Anwar Beg, Muhammad Ahmad, Arif Zaman, Imdadullah Khan

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Massive sizes of real-world graphs, such as social networks and web graph, impose serious challenges to process and perform analytics on them. These issues can be resolved by working on a small summary of the graph instead. A summary is a compressed version of the graph that removes several details, yet preserves it’s essential structure. Generally, some predefined quality measure of the summary is optimized to bound the approximation error incurred by working on the summary instead of the whole graph. All known summarization algorithms are computationally prohibitive and do not scale to large graphs. In this paper we present an efficient randomized algorithm to compute graph summaries with the goal to minimize reconstruction error. We propose a novel weighted sampling scheme to sample vertices for merging that will result in the least reconstruction error. We provide analytical bounds on the running time of the algorithm and prove approximation guarantee for our score computation. Efficiency of our algorithm makes it scalable to very large graphs on which known algorithms cannot be applied. We test our algorithm on several real world graphs to empirically demonstrate the quality of summaries produced and compare to state of the art algorithms. We use the summaries to answer several structural queries about original graph and report their accuracies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat LeFevre, K., Terzi, E.: GraSS: graph structure summarization. In: SIAM International Conference on Data Mining SDM, pp. 454–465 (2010)CrossRef LeFevre, K., Terzi, E.: GraSS: graph structure summarization. In: SIAM International Conference on Data Mining SDM, pp. 454–465 (2010)CrossRef
2.
Zurück zum Zitat Riondato, M., García-Soriano, D., Bonchi, F.: Graph summarization with quality guarantees. In: IEEE International Conference on Data Mining ICDM, pp. 947–952 (2014) Riondato, M., García-Soriano, D., Bonchi, F.: Graph summarization with quality guarantees. In: IEEE International Conference on Data Mining ICDM, pp. 947–952 (2014)
3.
Zurück zum Zitat Storer, J.: Data compression. Elsevier, Amsterdam (1988) Storer, J.: Data compression. Elsevier, Amsterdam (1988)
4.
Zurück zum Zitat Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)CrossRef Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)CrossRef
5.
Zurück zum Zitat Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: ACM International Conference on Management of Data SIGMOD, pp. 419–432 (2008) Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: ACM International Conference on Management of Data SIGMOD, pp. 419–432 (2008)
6.
Zurück zum Zitat Khan, K., Nawaz, W., Lee, Y.: Set-based approximate approach for lossless graph summarization. Computing 97(12), 1185–1207 (2015)MathSciNetCrossRef Khan, K., Nawaz, W., Lee, Y.: Set-based approximate approach for lossless graph summarization. Computing 97(12), 1185–1207 (2015)MathSciNetCrossRef
7.
Zurück zum Zitat Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: summarizing and understanding large graphs. In: SIAM International Conference on Data Mining SDM, pp. 91–99 (2014)CrossRef Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: summarizing and understanding large graphs. In: SIAM International Conference on Data Mining SDM, pp. 91–99 (2014)CrossRef
8.
Zurück zum Zitat Zhuang, H., Rahman, R., Hu, X., Guo, T., Hui, P., Aberer, K.: Data summarization with social contexts. In: ACM International Conference on Information and Knowledge Management CIKM, pp. 397–406 (2016) Zhuang, H., Rahman, R., Hu, X., Guo, T., Hui, P., Aberer, K.: Data summarization with social contexts. In: ACM International Conference on Information and Knowledge Management CIKM, pp. 397–406 (2016)
9.
Zurück zum Zitat Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 965–973 (2011) Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 965–973 (2011)
10.
Zurück zum Zitat Fan, W., Li, J., Wang, X., Wu, Y.: Query preserving graph compression. In: ACM International Conference on Management of Data SIGMOD, pp. 157–168 (2012) Fan, W., Li, J., Wang, X., Wu, Y.: Query preserving graph compression. In: ACM International Conference on Management of Data SIGMOD, pp. 157–168 (2012)
11.
Zurück zum Zitat Liu, Z., Yu, J.X., Cheng, H.: Approximate homogeneous graph summarization. J. Inf. Process. 20(1), 77–88 (2012) Liu, Z., Yu, J.X., Cheng, H.: Approximate homogeneous graph summarization. J. Inf. Process. 20(1), 77–88 (2012)
12.
Zurück zum Zitat Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: International Conference on World Wide Web WWW, pp. 595–602 (2004) Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: International Conference on World Wide Web WWW, pp. 595–602 (2004)
13.
Zurück zum Zitat Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Data Compression Conference DCC, pp. 203–212 (2001) Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Data Compression Conference DCC, pp. 203–212 (2001)
14.
Zurück zum Zitat Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 219–228 (2009) Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 219–228 (2009)
16.
Zurück zum Zitat Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRef
17.
Zurück zum Zitat Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef
18.
Zurück zum Zitat White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SIAM International Conference on Data Mining SDM, pp. 274–285 (2005)CrossRef White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SIAM International Conference on Data Mining SDM, pp. 274–285 (2005)CrossRef
19.
Zurück zum Zitat Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 150–160 (2000) Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: ACM International Conference on Knowledge Discovery and Data Mining SIGKDD, pp. 150–160 (2000)
20.
Zurück zum Zitat Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, Boca Raton (2010)MATH Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, Boca Raton (2010)MATH
21.
Zurück zum Zitat Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)MathSciNetCrossRef Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)MathSciNetCrossRef
22.
Zurück zum Zitat Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)MathSciNetCrossRef Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)MathSciNetCrossRef
Metadaten
Titel
Scalable Approximation Algorithm for Graph Summarization
verfasst von
Maham Anwar Beg
Muhammad Ahmad
Arif Zaman
Imdadullah Khan
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93040-4_40