research-article

Managing large dynamic graphs efficiently

Authors:
Jayanta Mondal

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
Amol Deshpande

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataMay 2012Pages 145–156https://doi.org/10.1145/2213836.2213854

Published:20 May 2012Publication History

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Pages 145–156

ABSTRACT

There is an increasing need to ingest, manage, and query large volumes of graph-structured data arising in applications like social networks, communication networks, biological networks, and so on. Graph databases that can explicitly reason about the graphical nature of the data, that can support flexible schemas and node-centric or edge-centric analysis and querying, are ideal for storing such data. However, although there is much work on single-site graph databases and on efficiently executing different types of queries over large graphs, to date there is little work on understanding the challenges in distributed graph databases, needed to handle the large scale of such data. In this paper, we propose the design of an in-memory, distributed graph data management system aimed at managing a large-scale dynamically changing graph, and supporting low-latency query processing over it. The key challenge in a distributed graph database is that, partitioning a graph across a set of machines inherently results in a large number of distributed traversals across partitions to answer even simple queries. We propose aggressive replication of the nodes in the graph for supporting low-latency querying, and investigate three novel techniques to minimize the communication bandwidth and the storage requirements. First, we develop a hybrid replication policy that monitors node read-write frequencies to dynamically decide what data to replicate, and whether to do eager or lazy replication. Second, we propose a clustering-based approach to amortize the costs of making these replication decisions. Finally, we propose using a fairness criterion to dictate how replication decisions should be made. We provide both theoretical analysis and efficient algorithms for the optimization problems that arise. We have implemented our framework as a middleware on top of the open-source CouchDB key-value store. We evaluate our system on a social graph, and show that our system is able to handle very large graphs efficiently, and that it reduces the network bandwidth consumption significantly.

References

B. Amann and M. Scholl. Gram: a graph data model and query languages. In ACM Hypertext, 1992. Google ScholarDigital Library
L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small-world networks. Proceedings of The National Academy of Sciences, 2000.Google ScholarCross Ref
A. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 1999.Google Scholar
F. Benevenuto, T. Rodrigues, M. Cha, and V. A. F. Almeida. Characterizing user behavior in online social networks. In Internet Measurement Conference, 2009. Google ScholarDigital Library
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, 2002. Google ScholarDigital Library
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics-theory and Experiment, 2008.Google Scholar
M. Brocheler, A. Pugliese, and V. S. Subrahmanian. COSI: Cloud oriented subgraph identification in massive social networks. In ASONAM, 2010. Google ScholarDigital Library
A. Capocci, V. D. P. Servedio, F. Colaiori, L. S. Buriol, D. Donato, S. Leonardi, and G. Caldarelli. Preferential attachment in the growth of social networks: The internet encyclopedia wikipedia. Phys. Rev. E, 2006.Google Scholar
U. V. Catalyurek and C. Aykanat. Patoh: Partitioning tool for hypergraphs. Bilkent University, Tech. Rep, 1999.Google Scholar
FactNexus Pty Ltd. Graphbase. http://www.graphbase.net/, 2011.Google Scholar
W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: from intractable to polynomial time. In VLDB, 2010. Google ScholarDigital Library
G. F. Georgakopoulos and K. Politopoulos. Max-density revisited: a generalization and a more efficient algorithm. The Computer Journal, 2007.Google Scholar
S. A. Golder, D. M. Wilkinson, and B. A. Huberman. Rhythms of social interaction: messaging within a massive online network. CoRR, abs/cs/0611137, 2006.Google Scholar
K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, 2008. Google ScholarDigital Library
R. Gonzalez, R. C. Rumin, A. Cuevas, and C. Guerrero. Where are my followers? understanding the locality effect in twitter. CoRR, abs/1105.3682, 2011.Google Scholar
R. Guting. GraphDB: Modeling and querying graphs in databases. In VLDB, 1994. Google ScholarDigital Library
M. Gyssens, J. Paredaens, and D. van Gucht. A graph-oriented object database model. In PODS, 1990. Google ScholarDigital Library
J. Hamilton. Scaling linkedin. http://perspectives.mvdirona.com/2008/06/08/ScalingLinkedIn.aspx, 2008.Google Scholar
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, 2008. Google ScholarDigital Library
B. Hendrickson and R. Leland. An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM Journal on Scientific Computing, 1995. Google ScholarDigital Library
High Scalability Blog. Why are facebook, digg, and twitter so hard to scale? http://highscalability.com/blog/2009/10/13/why-are-facebook-digg-and-twitter-so-hard-to-scale.html, 2009.Google Scholar
High Scalibility Blog. Friendster lost lead because of a failure to scale. http://highscalability.com/blog/ 2007/11/13/friendster-lost-lead-because-of-a-failure-to-scale.html, 2007.Google Scholar
J. Huang, D. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. In VLDB, 2011.Google ScholarDigital Library
InfiniteGraph. http://www.infinitegraph.com/, 2011.Google Scholar
R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In SIGMOD, 2008. Google ScholarDigital Library
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005. Google ScholarDigital Library
S. Kadambi, J. Chen, B. Cooper, D. Lomax, R. Ramakrishnan, A. Silberstein, E. Tam, and H. G. Molina. Where in the world is my data? In VLDB, 2011.Google Scholar
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system. In ICDM, 2009. Google ScholarDigital Library
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. In SIAM, 1999. Google ScholarDigital Library
S. Khuller and B. Saha. On finding dense subgraphs. In ICALP, 2009. Google ScholarDigital Library
Kobrix Software. A general purpose distributed data store, 2011. http://www.kobrix.com/hgdb.jsp.Google Scholar
R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD, 2006. Google ScholarDigital Library
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In WWW, 2008. Google ScholarDigital Library
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Journal of Internet Mathematics, 2009.Google Scholar
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In PODC, 2009. Google ScholarDigital Library
Microsoft Research. Trinity. http://research.microsoft.com/en-us/projects/trinity/, 2011.Google Scholar
Neo4j. Neo4j open source nosql graph database. http://neo4j.org/, 2011.Google Scholar
M. Newman. Why social networks are different from other types of networks. Physical Review E, 2003.Google ScholarCross Ref
M. Newman. Modularity and community structure in networks. In Proc. of The Natl. Academy of Sciences, 2006.Google Scholar
J. M. Pujol, V. Erramilli, and P. Rodriguez. Divide and conquer: Partitioning online social networks. CoRR, abs/0905.4918, 2009.Google Scholar
J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine(s) that could: scaling online social networks. In SIGCOMM, 2010. Google ScholarDigital Library
L. Sheng and Z. Ozsoyoglu. A graph query language and its query processing. In ICDE, 1999. Google ScholarDigital Library
A. Silberstein, J. Terrace, B. F. Cooper, and R. Ramakrishnan. Feeding Frenzy: Selectively materializing users' event feeds. In SIGMOD, 2010. Google ScholarDigital Library
T. Tran, H. Wang, S. Rudolph, and P. Cimiano. Top-k exploration of query candidates for efficient keyword search on graph-shaped (RDF) data. In ICDE, 2009. Google ScholarDigital Library
FlockDB. https://github.com/twitter/flockdb.Google Scholar
O. Wolfson, S. Jajodia, and Y. Huang. An adaptive data replication algorithm. In TODS, 1997. Google ScholarDigital Library
O. Wolfson and A. Milo. The multicast policy and its relationship to replicated data placement. In TODS, 1991. Google ScholarDigital Library
H. Yildirim, V. Chaoji, and M. J. Zaki. GRAIL: Scalable reachability index for large graphs. In VLDB, 2010. Google ScholarDigital Library

Index Terms

Managing large dynamic graphs efficiently
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
      2. Parallel and distributed DBMSs
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Critical ideals of signed graphs with twin vertices

This paper studies critical ideals of graphs with twin vertices, which are vertices with the same neighbors. A pair of such vertices are called replicated if they are adjacent, and duplicated, otherwise. Critical ideals of graphs having twin vertices ...
Read More
Large Induced Forests in Graphs

In this article, we prove three theorems. The first is that every connected graph of order n and size m has an induced forest of order at least 8n-2m-2/9 with equality if and only if such a graph is obtained from a tree by expanding every vertex to a ...
Read More
Scalable Pattern Matching over Compressed Graphs via Dedensification
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

One of the most common operations on graph databases is graph pattern matching (e.g., graph isomorphism and more general types of "subgraph pattern matching"). In fact, in some graph query languages every single query is expressed as a graph matching ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
May 2012
886 pages
ISBN:9781450312479
DOI:10.1145/2213836
General Chairs:
K. Selçuk Candan
Arizona State University
,
Yi Chen
Arizona State University
,
Richard Snodgrass
University of Arizona
,
Program Chair:
Luis Gravano
Columbia University
,
Publications Chair:
Ariel Fuxman
Microsoft Research
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feed delivery
graph databases
replication
social networks
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 76
  Total Citations
  View Citations
- 1,666
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Managing large dynamic graphs efficiently

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Critical ideals of signed graphs with twin vertices

Large Induced Forests in Graphs

Scalable Pattern Matching over Compressed Graphs via Dedensification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Managing large dynamic graphs efficiently

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Critical ideals of signed graphs with twin vertices

Large Induced Forests in Graphs

Scalable Pattern Matching over Compressed Graphs via Dedensification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media