skip to main content
research-article

Systems for big-graphs

Published:01 August 2014Publication History
Skip Abstract Section

Abstract

Graphs have become increasingly important to represent highly-interconnected structures and schema-less data including the World Wide Web, social networks, knowledge graphs, genome and scientific databases, medical and government records. The massive scale of graph data easily overwhelms the main memory and computation resources on commodity servers. In these cases, achieving low latency and high throughput requires partitioning the graph and processing the graph data in parallel across a cluster of servers. However, the software and and hardware advances that have worked well for developing parallel databases and scientific applications are not necessarily effective for big-graph problems. Graph processing poses interesting system challenges: graphs represent relationships which are usually irregular and unstructured; and therefore, the computation and data access patterns have poor locality. Hence, the last few years has seen an unprecedented interest in building systems for big-graphs by various communities including databases, systems, semantic web, machine learning, and operations research. In this tutorial, we discuss the design of the emerging systems for processing of big-graphs, key features of distributed graph algorithms, as well as graph partitioning and workload balancing techniques. We emphasize the current challenges and highlight some future research directions.

References

  1. P. Cudr-Mauroux and S. Elnikety. Graph Data Management Systems for New Application Domains. In VLDB, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Jiang, G. Chen, B. C. Ooi, K.-L. Tan, and S. Wu. epiC: an Extensible and Scalable System for Processing Big Data. In VLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Khan, Y. Wu, and X. Yan. Emerging Graph Queries in Linked Data. In ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing. In EuroSys, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale Graph Computation on Just a PC. In OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. In VLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry. Challenges in Parallel Graph Processing. Parallel Processing Letters, 17(1):5--20, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  9. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Mondal and A. Deshpande. Managing Large Dynamic Graphs Efficiently. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a Timely Dataflow System. In SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Sakr, S. Elnikety, and Y. He. G-SPARQL: a Hybrid Engine for Querying Large Attributed Graphs. In CIKM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Sarwat, S. Elnikety, Y. He, and M. F. Mokbel. Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs. In VLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. S. Xin, D. Crankshaw, A. Dave, J. E. Gonzalez, M. J. Franklin, and I. Stoica. GraphX: Unifying Data-Parallel and Graph-Parallel Analytics. CoRR, abs/1402.2394, 2014.Google ScholarGoogle Scholar
  16. S. Yang, X. Yan, B. Zong, and A. Khan. Towards Effective Partition Management for Large Graphs. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A Distributed Graph Engine for Web Scale RDF Data. In VLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Systems for big-graphs
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 7, Issue 13
      August 2014
      466 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 August 2014
      Published in pvldb Volume 7, Issue 13

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader