Skip to main content
Erschienen in: Computing 12/2015

01.12.2015

Set-based approximate approach for lossless graph summarization

verfasst von: Kifayat Ullah Khan, Waqas Nawaz, Young-Koo Lee

Erschienen in: Computing | Ausgabe 12/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graph summarization is valuable approach to analyze various real life phenomenon, like communities, influential nodes, and information flow in a big graph. To summarize a graph, nodes having similar neighbors are merged into super nodes and their corresponding edges are compressed into super edges. Existing methods find similar nodes either by nodes ordering or perform pairwise similarity computations. Compression-by-node ordering approaches are scalable but provide lesser compression due to exhaustive similarity computations of their counterparts. In this paper, we propose a novel set-based summarization approach that directly summarizes naturally occurring sets of similar nodes in a graph. Our approach is scalable since we avoid explicit similarity computations with non-similar nodes and merge sets of nodes in each iteration. Similarly, we provide good compression ratio as each set consists of highly similar nodes. To locate sets of similar nodes, we find candidate sets of similar nodes by using locality sensitive hashing. However, member nodes of every candidate set have varying similarities with each other. Therefore, we propose a heuristic based on similarity among degrees of candidate nodes, and a parameter-free pruning technique to effectively identify subset of highly similar nodes from candidate nodes. Through experiments on real world graphs, our approach requires lesser execution time than pairwise graph summarization, with margin of an order of magnitude in graphs containing nodes with highly diverse neighborhood, and produces summary at similar accuracy. Similarly, we observe comparable scalability against the compression-by-node ordering method, while providing better compression ratio.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
1.
Zurück zum Zitat Boldi P, Vigna S (2004) The webgraph framework i: compression techniques. In: Proceedings of the 13th international conference on World Wide Web, ACM, pp 595–602 Boldi P, Vigna S (2004) The webgraph framework i: compression techniques. In: Proceedings of the 13th international conference on World Wide Web, ACM, pp 595–602
2.
Zurück zum Zitat Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences 1997. IEEE, pp 21–29 Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences 1997. IEEE, pp 21–29
3.
Zurück zum Zitat Buttler D (2004) A short survey of document structure similarity algorithms. In: International conference on internet computing, pp 3–9 Buttler D (2004) A short survey of document structure similarity algorithms. In: International conference on internet computing, pp 3–9
4.
Zurück zum Zitat Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: Eighth IEEE international conference on data mining, 2008. ICDM’08, IEEE, pp 103–112 Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: Eighth IEEE international conference on data mining, 2008. ICDM’08, IEEE, pp 103–112
5.
Zurück zum Zitat Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228 Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228
6.
Zurück zum Zitat Chum O, Philbin J, Isard M, Zisserman A (2007) Scalable near identical image and shot detection. In: Proceedings of the 6th ACM international conference on Image and video retrieval, ACM, pp 549–556 Chum O, Philbin J, Isard M, Zisserman A (2007) Scalable near identical image and shot detection. In: Proceedings of the 6th ACM international conference on Image and video retrieval, ACM, pp 549–556
7.
Zurück zum Zitat Ghazizadeh S, Chawathe SS (2002) Seus: structure extraction using summaries. In: Discovery science. Springer, Berlin, pp 71–85 Ghazizadeh S, Chawathe SS (2002) Seus: structure extraction using summaries. In: Discovery science. Springer, Berlin, pp 71–85
8.
Zurück zum Zitat Gorisse D, Cord M, Precioso F (2012) Locality-sensitive hashing for chi2 distance. IEEE Trans Pattern Anal Mach Intell 34(2):402–409CrossRef Gorisse D, Cord M, Precioso F (2012) Locality-sensitive hashing for chi2 distance. IEEE Trans Pattern Anal Mach Intell 34(2):402–409CrossRef
9.
Zurück zum Zitat Hernández C, Navarro G (2013) Compressed representations for web and social graphs. In: Knowledge and information systems, pp 1–35 Hernández C, Navarro G (2013) Compressed representations for web and social graphs. In: Knowledge and information systems, pp 1–35
10.
Zurück zum Zitat Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces (survey article). ACM Trans Database Syst (TODS) 28(4):517–580CrossRef Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces (survey article). ACM Trans Database Syst (TODS) 28(4):517–580CrossRef
11.
Zurück zum Zitat Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, ACM, pp 604–613 Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, ACM, pp 604–613
12.
Zurück zum Zitat Kang U, Faloutsos C (2011) Beyond’caveman communities’: Hubs and spokes for graph compression and mining. In: 2011 IEEE 11th international conference on data mining (ICDM), IEEE, pp 300–309 Kang U, Faloutsos C (2011) Beyond’caveman communities’: Hubs and spokes for graph compression and mining. In: 2011 IEEE 11th international conference on data mining (ICDM), IEEE, pp 300–309
13.
Zurück zum Zitat Ketkar NS, Holder LB, Cook DJ (2005) Subdue: compression-based frequent pattern discovery in graph data. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, ACM, pp 71–76 Ketkar NS, Holder LB, Cook DJ (2005) Subdue: compression-based frequent pattern discovery in graph data. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, ACM, pp 71–76
14.
Zurück zum Zitat Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VOG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, 2014, pp 91–99. doi:10.1137/1.9781611973440.11 Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VOG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, 2014, pp 91–99. doi:10.​1137/​1.​9781611973440.​11
15.
Zurück zum Zitat LeFevre K, Terzi E (2010) Grass: graph structure summarization. Proceedings of the SIAM international conference on data mining, SDM 2010, April 29–May 1, 2010. Columbus, Ohio, pp 454–465 LeFevre K, Terzi E (2010) Grass: graph structure summarization. Proceedings of the SIAM international conference on data mining, SDM 2010, April 29–May 1, 2010. Columbus, Ohio, pp 454–465
16.
Zurück zum Zitat Liu S, Chen L, Ni LM, Fan J (2011) Cim: categorical influence maximization. In: Proceedings of the 5th international conference on ubiquitous information management and communication, ACM, p 124 Liu S, Chen L, Ni LM, Fan J (2011) Cim: categorical influence maximization. In: Proceedings of the 5th international conference on ubiquitous information management and communication, ACM, p 124
17.
Zurück zum Zitat Liu S, Wang S, Zhu F, Zhang J, Krishnan R (2014) Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, pp 51–62 Liu S, Wang S, Zhu F, Zhang J, Krishnan R (2014) Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, pp 51–62
18.
Zurück zum Zitat Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. Proc VLDB Endowment 3(1–2):693–702CrossRef Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. Proc VLDB Endowment 3(1–2):693–702CrossRef
19.
Zurück zum Zitat Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: AAAI Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: AAAI
20.
Zurück zum Zitat Nanopoulos A, Manolopoulos Y (2002) Efficient similarity search for market basket data. VLDB J 11(2):138–152CrossRef Nanopoulos A, Manolopoulos Y (2002) Efficient similarity search for market basket data. VLDB J 11(2):138–152CrossRef
21.
Zurück zum Zitat Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 419–432 Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 419–432
22.
Zurück zum Zitat Qu Q, Zhu F, Yan X, Han J, Philip SY, Li H (2011) Efficient topological olap on information networks. In: Database systems for advanced applications. Springer, Berlin, pp 389–403 Qu Q, Zhu F, Yan X, Han J, Philip SY, Li H (2011) Efficient topological olap on information networks. In: Database systems for advanced applications. Springer, Berlin, pp 389–403
23.
Zurück zum Zitat Qu Q, Liu S, Jensen CS, Zhu F, Faloutsos C (2014) Interestingness-driven diffusion process summarization in dynamic networks. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 597–613 Qu Q, Liu S, Jensen CS, Zhu F, Faloutsos C (2014) Interestingness-driven diffusion process summarization in dynamic networks. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 597–613
24.
Zurück zum Zitat Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge
25.
26.
Zurück zum Zitat Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM, pp 785–796 Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM, pp 785–796
27.
Zurück zum Zitat Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp 567–580 Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp 567–580
28.
Zurück zum Zitat Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 965–973 Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 965–973
29.
Zurück zum Zitat Tomar VS, Rose RC (2013) Efficient manifold learning for speech recognition using locality sensitive hashing. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6995–6999 Tomar VS, Rose RC (2013) Efficient manifold learning for speech recognition using locality sensitive hashing. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6995–6999
32.
Zurück zum Zitat Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002 IEEE international conference on data mining, 2002. ICDM 2003. Proceedings, IEEE, pp 721–724 Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002 IEEE international conference on data mining, 2002. ICDM 2003. Proceedings, IEEE, pp 721–724
33.
Zurück zum Zitat Yan X, Zhu F, Yu PS, Han J (2006) Feature-based similarity search in graph structures. ACM Trans Database Syst (TODS) 31(4):1418–1453CrossRef Yan X, Zhu F, Yu PS, Han J (2006) Feature-based similarity search in graph structures. ACM Trans Database Syst (TODS) 31(4):1418–1453CrossRef
34.
Zurück zum Zitat Yin M, Wu B, Zeng Z (2012) Hmgraph olap: a novel framework for multi-dimensional heterogeneous network analysis. In: Proceedings of the fifteenth international workshop on Data warehousing and OLAP, ACM, pp 137–144 Yin M, Wu B, Zeng Z (2012) Hmgraph olap: a novel framework for multi-dimensional heterogeneous network analysis. In: Proceedings of the fifteenth international workshop on Data warehousing and OLAP, ACM, pp 137–144
35.
Zurück zum Zitat Zhao P, Li X, Xin D, Han J (2011) Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 853–864 Zhao P, Li X, Xin D, Han J (2011) Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 853–864
36.
Zurück zum Zitat Zhu F, Zhang Z, Qu Q (2013) A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, pp 821–832 Zhu F, Zhang Z, Qu Q (2013) A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, pp 821–832
Metadaten
Titel
Set-based approximate approach for lossless graph summarization
verfasst von
Kifayat Ullah Khan
Waqas Nawaz
Young-Koo Lee
Publikationsdatum
01.12.2015
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 12/2015
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-015-0454-9

Premium Partner