Skip to main content
Erschienen in: Knowledge and Information Systems 2/2014

01.08.2014 | Regular Paper

Compressed representations for web and social graphs

verfasst von: Cecilia Hernández, Gonzalo Navarro

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph” patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with “virtual nodes”; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0–1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6–1.0 microseconds (\(\upmu \)s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9–1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5–20 \(\upmu \)s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4–13 bpe and 8–12 \(\upmu \)s per out/in-neighbor retrieved, which improves upon all existing representations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adler M, Mitzenmacher M (2001) Towards compressing web graphs. In: Proceedings of the data compression conference (DCC). Snowbird, UT, pp 203–212 Adler M, Mitzenmacher M (2001) Towards compressing web graphs. In: Proceedings of the data compression conference (DCC). Snowbird, UT, pp 203–212
3.
Zurück zum Zitat Anh V, Moffat A (2010) Local modeling for webgraph compression. In: Proceedings of the data compression conference (DCC). Snowbird UT, p 519 Anh V, Moffat A (2010) Local modeling for webgraph compression. In: Proceedings of the data compression conference (DCC). Snowbird UT, p 519
5.
Zurück zum Zitat Bader D, Madduri K (2005) Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In: Proceedings of the 12th international high performance computing (HiPC). Goa, India, pp 465–476 Bader D, Madduri K (2005) Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In: Proceedings of the 12th international high performance computing (HiPC). Goa, India, pp 465–476
6.
Zurück zum Zitat Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web spam detection. ACM Trans Web 2(1):2CrossRef Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web spam detection. ACM Trans Web 2(1):2CrossRef
7.
Zurück zum Zitat Boldi P, Vigna S (2004) The Webgraph framework I: compression techniques. In: Proceedings of the 13th international conference on the world wide web (WWW), New York, NY, pp 595–602 Boldi P, Vigna S (2004) The Webgraph framework I: compression techniques. In: Proceedings of the 13th international conference on the world wide web (WWW), New York, NY, pp 595–602
8.
Zurück zum Zitat Boldi P, Santini M, Vigna S (2009) Permuting web graph. In: The 6th workshop on algorithms and models for the web graph (WAW), Barcelona, Spain, pp 116–126 Boldi P, Santini M, Vigna S (2009) Permuting web graph. In: The 6th workshop on algorithms and models for the web graph (WAW), Barcelona, Spain, pp 116–126
9.
Zurück zum Zitat Boldi P, Rosa M, Santini M, Vigna S (2011) Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th international conference on world wide web (WWW), Hyderabad, India, pp 587–596 Boldi P, Rosa M, Santini M, Vigna S (2011) Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th international conference on world wide web (WWW), Hyderabad, India, pp 587–596
10.
Zurück zum Zitat Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 1(7):107–117 Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 1(7):107–117
11.
Zurück zum Zitat Brisaboa N, Ladra S, Navarro G (2009) K2-trees for compact web graph representation. In: Proceedings of the 16th international symposium on string processing and information retrieval (SPIRE), Saariselkä, Finland, pp 18–30 Brisaboa N, Ladra S, Navarro G (2009) K2-trees for compact web graph representation. In: Proceedings of the 16th international symposium on string processing and information retrieval (SPIRE), Saariselkä, Finland, pp 18–30
12.
Zurück zum Zitat Brisaboa N, Ladra S, Navarro G (2012) Personal communication including code Brisaboa N, Ladra S, Navarro G (2012) Personal communication including code
13.
Zurück zum Zitat Broder A (2000) Min-wise independent permutations: theory and practice. In: Proceedings of the 27th international colloquium on automata, languages and programming (ICALP), Geneva, Italy, p 808 Broder A (2000) Min-wise independent permutations: theory and practice. In: Proceedings of the 27th international colloquium on automata, languages and programming (ICALP), Geneva, Italy, p 808
14.
Zurück zum Zitat Brohée S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7:488CrossRef Brohée S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7:488CrossRef
15.
Zurück zum Zitat Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph (Algorithm 457). Commun ACM 16(9):575–576CrossRefMATH Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph (Algorithm 457). Commun ACM 16(9):575–576CrossRefMATH
16.
Zurück zum Zitat Buehrer G, Chellapilla K (2008) A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the international conference on web search and web data mining (WSDM), Palo Alto, CA, pp 95–106 Buehrer G, Chellapilla K (2008) A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the international conference on web search and web data mining (WSDM), Palo Alto, CA, pp 95–106
17.
Zurück zum Zitat Cha M, Mislove A, Gummadi P (2009) A measurement-driven analysis of information propagation in the Flickr social networking. In: Proceedings of the 20th international conference on world wide web (WWW), Madrid, Spain, pp 721–730 Cha M, Mislove A, Gummadi P (2009) A measurement-driven analysis of information propagation in the Flickr social networking. In: Proceedings of the 20th international conference on world wide web (WWW), Madrid, Spain, pp 721–730
18.
Zurück zum Zitat Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the 4th SIAM international conference on data mining (SDM), Lake Buena Vista, FL Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the 4th SIAM international conference on data mining (SDM), Lake Buena Vista, FL
19.
Zurück zum Zitat Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Paris, France, pp 219–228 Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Paris, France, pp 219–228
20.
Zurück zum Zitat Claude F, Navarro F (2010) Extended compact web graph representations. In: Algorithms and applications. Lecture notes in computer science 6060. Springer, Berlin, pp 77–91 Claude F, Navarro F (2010) Extended compact web graph representations. In: Algorithms and applications. Lecture notes in computer science 6060. Springer, Berlin, pp 77–91
21.
Zurück zum Zitat Claude F, Navarro G (2010) Fast and compact web graph representations. ACM Trans Web 4(4):16 Claude F, Navarro G (2010) Fast and compact web graph representations. ACM Trans Web 4(4):16
22.
Zurück zum Zitat Claude F, Navarro G (2008) Practical rank/select queries over arbitrary sequences. In: Proceedings of the 15th international symposium on string processing and information retrieval (SPIRE), Melbourne, Australia, pp 176–187 Claude F, Navarro G (2008) Practical rank/select queries over arbitrary sequences. In: Proceedings of the 15th international symposium on string processing and information retrieval (SPIRE), Melbourne, Australia, pp 176–187
23.
Zurück zum Zitat Claude F, Ladra S (2011) Practical representations for web and social graphs. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), Glasgow, UK, pp 1185–1190 Claude F, Ladra S (2011) Practical representations for web and social graphs. In: Proceedings of the 20th ACM conference on information and knowledge management (CIKM), Glasgow, UK, pp 1185–1190
24.
Zurück zum Zitat Clark D (1996) Compact Pat trees. Ph.D. Thesis, University of Waterloo, Canada Clark D (1996) Compact Pat trees. Ph.D. Thesis, University of Waterloo, Canada
25.
Zurück zum Zitat Demetrescu C, Finocchi I, Ribichini A (2006) Trading off space for passes in graph streaming problems. In: Proceedings of the 17th ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 714–723 Demetrescu C, Finocchi I, Ribichini A (2006) Trading off space for passes in graph streaming problems. In: Proceedings of the 17th ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 714–723
26.
Zurück zum Zitat Donato D, Millozzi S, Leonardi S, Tsaparas P (2005) Mining the inner structure of the web graph. In: Proceedings of the 8th workshop on the web and databases (WebDB), Baltimore, MD, pp 145–150 Donato D, Millozzi S, Leonardi S, Tsaparas P (2005) Mining the inner structure of the web graph. In: Proceedings of the 8th workshop on the web and databases (WebDB), Baltimore, MD, pp 145–150
27.
Zurück zum Zitat Dourisboure Y, Geraci F, Pellegrini M (2007) Extraction and classification of dense communities in the web. In: Proceedings of the 16th international conference on world wide web (WWW) Banff, Alberta, Canada, pp 461–470 Dourisboure Y, Geraci F, Pellegrini M (2007) Extraction and classification of dense communities in the web. In: Proceedings of the 16th international conference on world wide web (WWW) Banff, Alberta, Canada, pp 461–470
28.
Zurück zum Zitat Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on very large data bases (VLDB), Trondheim, Norway, pp 721–732 Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on very large data bases (VLDB), Trondheim, Norway, pp 721–732
29.
Zurück zum Zitat González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: Poster Proceedings of the volume of 4th workshop on efficient and experimental algorithms (WEA), Santorini Island, Greece, pp 27–38 González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: Poster Proceedings of the volume of 4th workshop on efficient and experimental algorithms (WEA), Santorini Island, Greece, pp 27–38
30.
Zurück zum Zitat Golynski A, Munro J, Rao S (2006) Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 368–373 Golynski A, Munro J, Rao S (2006) Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithms (SODA), Miami, FL, pp 368–373
31.
Zurück zum Zitat Grabowski S, Bieniecki W (2010) Tight and simple web graph compression. CoRR abs/006.0809 Grabowski S, Bieniecki W (2010) Tight and simple web graph compression. CoRR abs/006.0809
32.
Zurück zum Zitat Grabowski S, Bieniecki W (2011) Merging adjacency lists for efficient web graph compression. Adv Intell Soft Comput 103(1):385–392 Grabowski S, Bieniecki W (2011) Merging adjacency lists for efficient web graph compression. Adv Intell Soft Comput 103(1):385–392
33.
Zurück zum Zitat Grossi R, Gupta A, Vitter J (2003) High-order entropy-compressed text indexes. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms (SODA), Baltimore, MD, pp 841–850 Grossi R, Gupta A, Vitter J (2003) High-order entropy-compressed text indexes. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms (SODA), Baltimore, MD, pp 841–850
34.
Zurück zum Zitat Hasan M, Salem S, Zaki M (2011) SimClus: an effective algorithm for clustering with a lower bound on similarity. Knowl Inf Syst 28(3):665–685 Hasan M, Salem S, Zaki M (2011) SimClus: an effective algorithm for clustering with a lower bound on similarity. Knowl Inf Syst 28(3):665–685
35.
Zurück zum Zitat Hernández C, Navarro G (2011) Compression of web and social graphs supporting neighbor and community queries. In: Proceedings of the 6th ACM workshop on social network mining and analysis (SNAKDD), San Diego, CA Hernández C, Navarro G (2011) Compression of web and social graphs supporting neighbor and community queries. In: Proceedings of the 6th ACM workshop on social network mining and analysis (SNAKDD), San Diego, CA
36.
Zurück zum Zitat Hernández C, Navarro G (2012) Compressed representation of web and social networks via dense subgraphs. In: Proceedings of the 19th international symposium on string processing and information retrieval (SPIRE), Cartagena de Indias, Colombia, pp 264–276 Hernández C, Navarro G (2012) Compressed representation of web and social networks via dense subgraphs. In: Proceedings of the 19th international symposium on string processing and information retrieval (SPIRE), Cartagena de Indias, Colombia, pp 264–276
37.
Zurück zum Zitat Katarzyna M, Przemyslaw K, Piotr B (2009) User position measures in social networks. In: Proceedings of the 4th ACM workshop on social network mining and analysis (SNAKDD), Paris, France, pp 1–9 Katarzyna M, Przemyslaw K, Piotr B (2009) User position measures in social networks. In: Proceedings of the 4th ACM workshop on social network mining and analysis (SNAKDD), Paris, France, pp 1–9
39.
Zurück zum Zitat Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the web for emerging cyber-communities. Comput Netw 31(11):1481–1493CrossRef Kumar R, Raghavan P, Rajagopalan S, Tomkins A (1999) Trawling the web for emerging cyber-communities. Comput Netw 31(11):1481–1493CrossRef
40.
Zurück zum Zitat Larsson N, Moffat A (1999) Offline dictionary-based compression. In: Proceedings of the data compression conference (DCC), Snowbird, Utah, pp 296–305 Larsson N, Moffat A (1999) Offline dictionary-based compression. In: Proceedings of the data compression conference (DCC), Snowbird, Utah, pp 296–305
41.
Zurück zum Zitat Lee V, Ruan N, Jin R, Aggarwal C (2010) A survey of algorithms for dense subgraph discovery. Manag Min Graph Data 2010:303–336CrossRef Lee V, Ruan N, Jin R, Aggarwal C (2010) A survey of algorithms for dense subgraph discovery. Manag Min Graph Data 2010:303–336CrossRef
42.
Zurück zum Zitat Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. PVLDB J 3(1):693–702 Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. PVLDB J 3(1):693–702
43.
Zurück zum Zitat Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 533–542 Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 533–542
44.
Zurück zum Zitat Mcpherson J, Ma K, Ogawa M (2005) Discovering parametric clusters in social small-world graphs. In: Proceedings of the ACM symposium on applied computing, Santa Fe, New Mexico, USA Mcpherson J, Ma K, Ogawa M (2005) Discovering parametric clusters in social small-world graphs. In: Proceedings of the ACM symposium on applied computing, Santa Fe, New Mexico, USA
45.
Zurück zum Zitat Mislove A, Marcon M, Gummadi P, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the internet measurement conference (IMC), San Diego, CA, pp 29–42 Mislove A, Marcon M, Gummadi P, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the internet measurement conference (IMC), San Diego, CA, pp 29–42
46.
Zurück zum Zitat Mishra R, Shukla S, Arora D, Kumar M (2011) An effective comparison of graph clustering algorithms via random graphs. Int J Comput Appl 22(1):22–27 Mishra R, Shukla S, Arora D, Kumar M (2011) An effective comparison of graph clustering algorithms via random graphs. Int J Comput Appl 22(1):22–27
47.
Zurück zum Zitat Morik K, Kaspari A, Wurst M (2012) Multi-objective frequent termset clustering. Knowl Inf Syst 30(3):715–738CrossRef Morik K, Kaspari A, Wurst M (2012) Multi-objective frequent termset clustering. Knowl Inf Syst 30(3):715–738CrossRef
48.
Zurück zum Zitat Randall K, Stata R, Wiener J, Wickremesinghe R (2002) The link database: fast access to graphs of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 122–131 Randall K, Stata R, Wiener J, Wickremesinghe R (2002) The link database: fast access to graphs of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 122–131
49.
Zurück zum Zitat Raman R, Raman V, Rao S (2002) Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of the 13th annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 233–242 Raman R, Raman V, Rao S (2002) Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of the 13th annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 233–242
50.
Zurück zum Zitat Saito H, Toyoda M, Kitsuregawa M, Aihara K (2007) A large-scale study of link spam detection by graph algorithms. In: Proceedings of adversarial information retrieval on the web (AIRWeb), Banff, Alberta, Canada Saito H, Toyoda M, Kitsuregawa M, Aihara K (2007) A large-scale study of link spam detection by graph algorithms. In: Proceedings of adversarial information retrieval on the web (AIRWeb), Banff, Alberta, Canada
51.
Zurück zum Zitat Saito K, Kimura M, Ohara K, Motoda H (2012) Efficient discovery of influential nodes for SIS models in social networks. Knowl Inf Syst 30(3):613–635CrossRef Saito K, Kimura M, Ohara K, Motoda H (2012) Efficient discovery of influential nodes for SIS models in social networks. Knowl Inf Syst 30(3):613–635CrossRef
52.
Zurück zum Zitat Suel T, Yuan J (2001) Compressing the graph structure of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 213–222 Suel T, Yuan J (2001) Compressing the graph structure of the web. In: Proceedings of the data compression conference (DCC), Snowbird, UT, pp 213–222
53.
Zurück zum Zitat Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on the world wide web (WWW), Hyderabad, India, pp 607–614 Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on the world wide web (WWW), Hyderabad, India, pp 607–614
54.
Zurück zum Zitat Van Dongen, S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht, The Netherlands Van Dongen, S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht, The Netherlands
56.
Zurück zum Zitat Vitter J (2001) External memory algorithms and data structures: dealing with massive data. ACM Comput Surv 33(2):209–271CrossRef Vitter J (2001) External memory algorithms and data structures: dealing with massive data. ACM Comput Surv 33(2):209–271CrossRef
57.
Zurück zum Zitat Zhuge H (2009) Communities and emerging semantics in semantic link network: discovery and learning. IEEE Trans Knowl Data Eng 21(6):785–799 Zhuge H (2009) Communities and emerging semantics in semantic link network: discovery and learning. IEEE Trans Knowl Data Eng 21(6):785–799
Metadaten
Titel
Compressed representations for web and social graphs
verfasst von
Cecilia Hernández
Gonzalo Navarro
Publikationsdatum
01.08.2014
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2014
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0648-4

Weitere Artikel der Ausgabe 2/2014

Knowledge and Information Systems 2/2014 Zur Ausgabe

Premium Partner