Skip to main content
Erschienen in: The Journal of Supercomputing 10/2020

15.01.2018

An effective graph summarization and compression technique for a large-scaled graph

verfasst von: Hojin Seo, Kisung Park, Yongkoo Han, Hyunwook Kim, Muhammad Umair, Kifayat Ullah Khan, Young-Koo Lee

Erschienen in: The Journal of Supercomputing | Ausgabe 10/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graphs are widely used in various applications, and their size is becoming larger over the passage of time. It is necessary to reduce their size to minimize main memory needs and to save the storage space on disk. For these purposes, graph summarization and compression approaches have been studied in various existing studies to reduce the size of a large graph. Graph summarization aggregates nodes having similar structural properties to represent a graph with reduced main memory requirements. Whereas graph compression applies various encoding techniques so that the resultant graph needs lesser storage space on disk. Considering usefulness of both the paradigms, we propose to obtain best of the both worlds by combining summarization and compression approaches. Hence, we present a greedy-based algorithm that greatly reduces the size of a large graph by applying both the compression and summarization. We also propose a novel cost model for calculating the compression ratio considering both the compression and summarization strategies. The algorithm uses the proposed cost model to determine whether to perform one or both of them in every iteration. Through comprehensive experiments on real-world datasets, we show that our proposed algorithm achieves a better compression ratio than only applying summarization approaches by up to 16%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VoG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp 91–99 Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VoG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp 91–99
2.
Zurück zum Zitat Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 965–973 Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 965–973
3.
Zurück zum Zitat Khan KU, Nawaz W, Lee YK (2015) Set-based approximate approach for lossless graph summarization. Computing 97(12):1185–1207MathSciNetCrossRef Khan KU, Nawaz W, Lee YK (2015) Set-based approximate approach for lossless graph summarization. Computing 97(12):1185–1207MathSciNetCrossRef
5.
Zurück zum Zitat Khan KU (2015) Set-based approach for lossless graph summarization using locality sensitive hashing. In: 31st IEEE International Conference on Data Engineering Workshops (ICDEW), 2015. IEEE, pp 255–259 Khan KU (2015) Set-based approach for lossless graph summarization using locality sensitive hashing. In: 31st IEEE International Conference on Data Engineering Workshops (ICDEW), 2015. IEEE, pp 255–259
6.
Zurück zum Zitat LeFevre K, Terzi E (2010) Grass: graph structure summarization. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2010, Columbus, pp 454–465 LeFevre K, Terzi E (2010) Grass: graph structure summarization. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2010, Columbus, pp 454–465
8.
Zurück zum Zitat Shi L, Tong H, Tang J, Lin C (2015) Vegas: visual influence graph summarization on citation networks. IEEE Trans Knowl Data Eng 27(12):3417–3431CrossRef Shi L, Tong H, Tang J, Lin C (2015) Vegas: visual influence graph summarization on citation networks. IEEE Trans Knowl Data Eng 27(12):3417–3431CrossRef
9.
Zurück zum Zitat Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data. ACM, pp 419–432 Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data. ACM, pp 419–432
10.
Zurück zum Zitat Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions—I. Math Program 14(1):265–294MathSciNetCrossRef Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions—I. Math Program 14(1):265–294MathSciNetCrossRef
11.
Zurück zum Zitat Liakos P, Papakonstantinopoulou K, Sioutis M (2014) Pushing the envelope in graph compression. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. ACM Liakos P, Papakonstantinopoulou K, Sioutis M (2014) Pushing the envelope in graph compression. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. ACM
12.
Zurück zum Zitat Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp 567–580 Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp 567–580
13.
Zurück zum Zitat Tang N, Chen Q, Mitra P (2016) Graph stream summarization: from big bang to big crunch. In: Proceedings of the 2016 International Conference on Management of Data. ACM Tang N, Chen Q, Mitra P (2016) Graph stream summarization: from big bang to big crunch. In: Proceedings of the 2016 International Conference on Management of Data. ACM
14.
Zurück zum Zitat Boldi P, Vigna S (2004) The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web. ACM, pp 595–602 Boldi P, Vigna S (2004) The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web. ACM, pp 595–602
15.
Zurück zum Zitat Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 219–228 Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 219–228
16.
Zurück zum Zitat Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
17.
Zurück zum Zitat Hernandez C, Navarro G (2014) Compressed representations for web and social graphs. Knowl Inf Syst 40(2):279CrossRef Hernandez C, Navarro G (2014) Compressed representations for web and social graphs. Knowl Inf Syst 40(2):279CrossRef
18.
Zurück zum Zitat Wu K, Shoshani A, Otoo E (2004) U.S. Patent No. 6,831,575. U.S. Patent and Trademark Office, Washington, DC Wu K, Shoshani A, Otoo E (2004) U.S. Patent No. 6,831,575. U.S. Patent and Trademark Office, Washington, DC
20.
Zurück zum Zitat Faloutsos C, Megalooikonomou V (2007) On data mining, compression and Kolmogorov complexity. Data Min Knowl Discov 15:3–20MathSciNetCrossRef Faloutsos C, Megalooikonomou V (2007) On data mining, compression and Kolmogorov complexity. Data Min Knowl Discov 15:3–20MathSciNetCrossRef
21.
Zurück zum Zitat Seo H, Kim H, Park K, Han Y, Lee YK (2015) Summarization technique on a compressed graph for massive graph analysis. Korean Soc Big Data Serv 2(1):25–35 Seo H, Kim H, Park K, Han Y, Lee YK (2015) Summarization technique on a compressed graph for massive graph analysis. Korean Soc Big Data Serv 2(1):25–35
22.
Zurück zum Zitat Otoo EJ, Shosahni A, Nordberg H (2001) Notes on design and implementation of compressed bit vectors. Lawrence Berkeley National Laboratory, Berkeley Otoo EJ, Shosahni A, Nordberg H (2001) Notes on design and implementation of compressed bit vectors. Lawrence Berkeley National Laboratory, Berkeley
23.
Zurück zum Zitat Lim Y, Kang U, Faloutsos C (2014) Slashburn: graph compression and mining beyond caveman communities. IEEE Trans Knowl Data Eng 26(12):3077–3089CrossRef Lim Y, Kang U, Faloutsos C (2014) Slashburn: graph compression and mining beyond caveman communities. IEEE Trans Knowl Data Eng 26(12):3077–3089CrossRef
24.
Zurück zum Zitat van Schaik SJ, de Moor O (2011) A memory efficient reachability data structure through bit vector compression. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM van Schaik SJ, de Moor O (2011) A memory efficient reachability data structure through bit vector compression. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM
25.
Zurück zum Zitat Riondato M, Garcia-Soriano D, Bonchi F (2014) Graph summarization with quality guarantees. In: 2014 IEEE International Conference on Data Mining (ICDM). IEEE, pp 947–952 Riondato M, Garcia-Soriano D, Bonchi F (2014) Graph summarization with quality guarantees. In: 2014 IEEE International Conference on Data Mining (ICDM). IEEE, pp 947–952
26.
Zurück zum Zitat Liu W, Kan A, Chan J, Bailey J, Leckie C, Pei J, Kotagiri R (2012) On compressing weighted time-evolving graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 2319–2322 Liu W, Kan A, Chan J, Bailey J, Leckie C, Pei J, Kotagiri R (2012) On compressing weighted time-evolving graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 2319–2322
27.
Zurück zum Zitat Khan KU et al (2017) Faster compression methods for a weighted graph using locality sensitive hashing. Inf Sci 421:237–253MathSciNetCrossRef Khan KU et al (2017) Faster compression methods for a weighted graph using locality sensitive hashing. Inf Sci 421:237–253MathSciNetCrossRef
28.
Zurück zum Zitat Zhang N, Tian Y, Patel JM (2010) Discovery-driven graph summarization. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 880–891 Zhang N, Tian Y, Patel JM (2010) Discovery-driven graph summarization. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 880–891
29.
Zurück zum Zitat Khan KU, Nawaz W, Lee YK (2017) Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20(3):543–570CrossRef Khan KU, Nawaz W, Lee YK (2017) Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20(3):543–570CrossRef
31.
Zurück zum Zitat Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471CrossRef Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471CrossRef
Metadaten
Titel
An effective graph summarization and compression technique for a large-scaled graph
verfasst von
Hojin Seo
Kisung Park
Yongkoo Han
Hyunwook Kim
Muhammad Umair
Kifayat Ullah Khan
Young-Koo Lee
Publikationsdatum
15.01.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 10/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2245-5

Weitere Artikel der Ausgabe 10/2020

The Journal of Supercomputing 10/2020 Zur Ausgabe

Premium Partner