skip to main content
10.1145/2213836.2213854acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Managing large dynamic graphs efficiently

Published:20 May 2012Publication History

ABSTRACT

There is an increasing need to ingest, manage, and query large volumes of graph-structured data arising in applications like social networks, communication networks, biological networks, and so on. Graph databases that can explicitly reason about the graphical nature of the data, that can support flexible schemas and node-centric or edge-centric analysis and querying, are ideal for storing such data. However, although there is much work on single-site graph databases and on efficiently executing different types of queries over large graphs, to date there is little work on understanding the challenges in distributed graph databases, needed to handle the large scale of such data. In this paper, we propose the design of an in-memory, distributed graph data management system aimed at managing a large-scale dynamically changing graph, and supporting low-latency query processing over it. The key challenge in a distributed graph database is that, partitioning a graph across a set of machines inherently results in a large number of distributed traversals across partitions to answer even simple queries. We propose aggressive replication of the nodes in the graph for supporting low-latency querying, and investigate three novel techniques to minimize the communication bandwidth and the storage requirements. First, we develop a hybrid replication policy that monitors node read-write frequencies to dynamically decide what data to replicate, and whether to do eager or lazy replication. Second, we propose a clustering-based approach to amortize the costs of making these replication decisions. Finally, we propose using a fairness criterion to dictate how replication decisions should be made. We provide both theoretical analysis and efficient algorithms for the optimization problems that arise. We have implemented our framework as a middleware on top of the open-source CouchDB key-value store. We evaluate our system on a social graph, and show that our system is able to handle very large graphs efficiently, and that it reduces the network bandwidth consumption significantly.

References

  1. B. Amann and M. Scholl. Gram: a graph data model and query languages. In ACM Hypertext, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small-world networks. Proceedings of The National Academy of Sciences, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 1999.Google ScholarGoogle Scholar
  4. F. Benevenuto, T. Rodrigues, M. Cha, and V. A. F. Almeida. Characterizing user behavior in online social networks. In Internet Measurement Conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics-theory and Experiment, 2008.Google ScholarGoogle Scholar
  7. M. Brocheler, A. Pugliese, and V. S. Subrahmanian. COSI: Cloud oriented subgraph identification in massive social networks. In ASONAM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Capocci, V. D. P. Servedio, F. Colaiori, L. S. Buriol, D. Donato, S. Leonardi, and G. Caldarelli. Preferential attachment in the growth of social networks: The internet encyclopedia wikipedia. Phys. Rev. E, 2006.Google ScholarGoogle Scholar
  9. U. V. Catalyurek and C. Aykanat. Patoh: Partitioning tool for hypergraphs. Bilkent University, Tech. Rep, 1999.Google ScholarGoogle Scholar
  10. FactNexus Pty Ltd. Graphbase. http://www.graphbase.net/, 2011.Google ScholarGoogle Scholar
  11. W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: from intractable to polynomial time. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. F. Georgakopoulos and K. Politopoulos. Max-density revisited: a generalization and a more efficient algorithm. The Computer Journal, 2007.Google ScholarGoogle Scholar
  13. S. A. Golder, D. M. Wilkinson, and B. A. Huberman. Rhythms of social interaction: messaging within a massive online network. CoRR, abs/cs/0611137, 2006.Google ScholarGoogle Scholar
  14. K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Gonzalez, R. C. Rumin, A. Cuevas, and C. Guerrero. Where are my followers? understanding the locality effect in twitter. CoRR, abs/1105.3682, 2011.Google ScholarGoogle Scholar
  16. R. Guting. GraphDB: Modeling and querying graphs in databases. In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Gyssens, J. Paredaens, and D. van Gucht. A graph-oriented object database model. In PODS, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Hamilton. Scaling linkedin. http://perspectives.mvdirona.com/2008/06/08/ScalingLinkedIn.aspx, 2008.Google ScholarGoogle Scholar
  19. H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Hendrickson and R. Leland. An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM Journal on Scientific Computing, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. High Scalability Blog. Why are facebook, digg, and twitter so hard to scale? http://highscalability.com/blog/2009/10/13/why-are-facebook-digg-and-twitter-so-hard-to-scale.html, 2009.Google ScholarGoogle Scholar
  22. High Scalibility Blog. Friendster lost lead because of a failure to scale. http://highscalability.com/blog/ 2007/11/13/friendster-lost-lead-because-of-a-failure-to-scale.html, 2007.Google ScholarGoogle Scholar
  23. J. Huang, D. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. In VLDB, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. InfiniteGraph. http://www.infinitegraph.com/, 2011.Google ScholarGoogle Scholar
  25. R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Kadambi, J. Chen, B. Cooper, D. Lomax, R. Ramakrishnan, A. Silberstein, E. Tam, and H. G. Molina. Where in the world is my data? In VLDB, 2011.Google ScholarGoogle Scholar
  28. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system. In ICDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. In SIAM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Khuller and B. Saha. On finding dense subgraphs. In ICALP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kobrix Software. A general purpose distributed data store, 2011. http://www.kobrix.com/hgdb.jsp.Google ScholarGoogle Scholar
  32. R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Journal of Internet Mathematics, 2009.Google ScholarGoogle Scholar
  35. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In PODC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Microsoft Research. Trinity. http://research.microsoft.com/en-us/projects/trinity/, 2011.Google ScholarGoogle Scholar
  37. Neo4j. Neo4j open source nosql graph database. http://neo4j.org/, 2011.Google ScholarGoogle Scholar
  38. M. Newman. Why social networks are different from other types of networks. Physical Review E, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  39. M. Newman. Modularity and community structure in networks. In Proc. of The Natl. Academy of Sciences, 2006.Google ScholarGoogle Scholar
  40. J. M. Pujol, V. Erramilli, and P. Rodriguez. Divide and conquer: Partitioning online social networks. CoRR, abs/0905.4918, 2009.Google ScholarGoogle Scholar
  41. J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine(s) that could: scaling online social networks. In SIGCOMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. L. Sheng and Z. Ozsoyoglu. A graph query language and its query processing. In ICDE, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Silberstein, J. Terrace, B. F. Cooper, and R. Ramakrishnan. Feeding Frenzy: Selectively materializing users' event feeds. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. Tran, H. Wang, S. Rudolph, and P. Cimiano. Top-k exploration of query candidates for efficient keyword search on graph-shaped (RDF) data. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. FlockDB. https://github.com/twitter/flockdb.Google ScholarGoogle Scholar
  46. O. Wolfson, S. Jajodia, and Y. Huang. An adaptive data replication algorithm. In TODS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. O. Wolfson and A. Milo. The multicast policy and its relationship to replicated data placement. In TODS, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. H. Yildirim, V. Chaoji, and M. J. Zaki. GRAIL: Scalable reachability index for large graphs. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Managing large dynamic graphs efficiently

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
          May 2012
          886 pages
          ISBN:9781450312479
          DOI:10.1145/2213836

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 May 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader