nach oben

Computing

Erschienen in:

01.12.2015

Set-based approximate approach for lossless graph summarization

verfasst von: Kifayat Ullah Khan, Waqas Nawaz, Young-Koo Lee

Erschienen in: Computing | Ausgabe 12/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Graph summarization is valuable approach to analyze various real life phenomenon, like communities, influential nodes, and information flow in a big graph. To summarize a graph, nodes having similar neighbors are merged into super nodes and their corresponding edges are compressed into super edges. Existing methods find similar nodes either by nodes ordering or perform pairwise similarity computations. Compression-by-node ordering approaches are scalable but provide lesser compression due to exhaustive similarity computations of their counterparts. In this paper, we propose a novel set-based summarization approach that directly summarizes naturally occurring sets of similar nodes in a graph. Our approach is scalable since we avoid explicit similarity computations with non-similar nodes and merge sets of nodes in each iteration. Similarly, we provide good compression ratio as each set consists of highly similar nodes. To locate sets of similar nodes, we find candidate sets of similar nodes by using locality sensitive hashing. However, member nodes of every candidate set have varying similarities with each other. Therefore, we propose a heuristic based on similarity among degrees of candidate nodes, and a parameter-free pruning technique to effectively identify subset of highly similar nodes from candidate nodes. Through experiments on real world graphs, our approach requires lesser execution time than pairwise graph summarization, with margin of an order of magnitude in graphs containing nodes with highly diverse neighborhood, and produces summary at similar accuracy. Similarly, we observe comparable scalability against the compression-by-node ordering method, while providing better compression ratio.

Vorheriger Artikel : a -degree anonymity with vertex and edge modification algorithm

Nächster Artikel An adaptive RL based approach for dynamic resource provisioning in Cloud virtualized data centers

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://snap.stanford.edu/data/index.html. Last accessed on 10/28/2014.

Boldi P, Vigna S (2004) The webgraph framework i: compression techniques. In: Proceedings of the 13th international conference on World Wide Web, ACM, pp 595–602

Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences 1997. IEEE, pp 21–29

Buttler D (2004) A short survey of document structure similarity algorithms. In: International conference on internet computing, pp 3–9

Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: Eighth IEEE international conference on data mining, 2008. ICDM’08, IEEE, pp 103–112

Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228

Chum O, Philbin J, Isard M, Zisserman A (2007) Scalable near identical image and shot detection. In: Proceedings of the 6th ACM international conference on Image and video retrieval, ACM, pp 549–556

Ghazizadeh S, Chawathe SS (2002) Seus: structure extraction using summaries. In: Discovery science. Springer, Berlin, pp 71–85

Gorisse D, Cord M, Precioso F (2012) Locality-sensitive hashing for chi2 distance. IEEE Trans Pattern Anal Mach Intell 34(2):402–409CrossRef

Hernández C, Navarro G (2013) Compressed representations for web and social graphs. In: Knowledge and information systems, pp 1–35

10.

Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces (survey article). ACM Trans Database Syst (TODS) 28(4):517–580CrossRef

11.

Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, ACM, pp 604–613

12.

Kang U, Faloutsos C (2011) Beyond’caveman communities’: Hubs and spokes for graph compression and mining. In: 2011 IEEE 11th international conference on data mining (ICDM), IEEE, pp 300–309

13.

Ketkar NS, Holder LB, Cook DJ (2005) Subdue: compression-based frequent pattern discovery in graph data. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, ACM, pp 71–76

14.

Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VOG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, 2014, pp 91–99. doi:10.1137/1.9781611973440.11

15.

LeFevre K, Terzi E (2010) Grass: graph structure summarization. Proceedings of the SIAM international conference on data mining, SDM 2010, April 29–May 1, 2010. Columbus, Ohio, pp 454–465

16.

Liu S, Chen L, Ni LM, Fan J (2011) Cim: categorical influence maximization. In: Proceedings of the 5th international conference on ubiquitous information management and communication, ACM, p 124

17.

Liu S, Wang S, Zhu F, Zhang J, Krishnan R (2014) Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, pp 51–62

18.

Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. Proc VLDB Endowment 3(1–2):693–702CrossRef

19.

Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: AAAI

20.

Nanopoulos A, Manolopoulos Y (2002) Efficient similarity search for market basket data. VLDB J 11(2):138–152CrossRef

21.

Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 419–432

22.

Qu Q, Zhu F, Yan X, Han J, Philip SY, Li H (2011) Efficient topological olap on information networks. In: Database systems for advanced applications. Springer, Berlin, pp 389–403

23.

Qu Q, Liu S, Jensen CS, Zhu F, Faloutsos C (2014) Interestingness-driven diffusion process summarization in dynamic networks. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 597–613

24.

Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge

25.

Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471MATHCrossRef

26.

Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM, pp 785–796

27.

Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp 567–580

28.

Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 965–973

29.

Tomar VS, Rose RC (2013) Efficient manifold learning for speech recognition using locality sensitive hashing. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6995–6999

30.

Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. arXiv:1408.2927

31.

Xiong Y, Zhu Y, Yu P (2014) Top-k similarity join in heterogeneous information networks. In: IEEE Trans Knowledge Data Eng, PP(99), 1. doi:10.1109/TKDE.2014.2373385

32.

Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002 IEEE international conference on data mining, 2002. ICDM 2003. Proceedings, IEEE, pp 721–724

33.

Yan X, Zhu F, Yu PS, Han J (2006) Feature-based similarity search in graph structures. ACM Trans Database Syst (TODS) 31(4):1418–1453CrossRef

34.

Yin M, Wu B, Zeng Z (2012) Hmgraph olap: a novel framework for multi-dimensional heterogeneous network analysis. In: Proceedings of the fifteenth international workshop on Data warehousing and OLAP, ACM, pp 137–144

35.

Zhao P, Li X, Xin D, Han J (2011) Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 853–864

36.

Zhu F, Zhang Z, Qu Q (2013) A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, pp 821–832

Titel: Set-based approximate approach for lossless graph summarization
verfasst von: Kifayat Ullah Khan
Waqas Nawaz
Young-Koo Lee
Publikationsdatum: 01.12.2015
Verlag: Springer Vienna
Erschienen in: Computing / Ausgabe 12/2015
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI: https://doi.org/10.1007/s00607-015-0454-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Premium Partner