skip to main content
10.1145/2484425.2484427acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

GraphX: a resilient distributed graph system on Spark

Published:23 June 2013Publication History

ABSTRACT

From social networks to targeted advertising, big graphs capture the structure in data and are central to recent advances in machine learning and data mining. Unfortunately, directly applying existing data-parallel tools to graph computation tasks can be cumbersome and inefficient. The need for intuitive, scalable tools for graph computation has lead to the development of new graph-parallel systems (e.g., Pregel, PowerGraph) which are designed to efficiently execute graph algorithms. Unfortunately, these new graph-parallel systems do not address the challenges of graph construction and transformation which are often just as problematic as the subsequent computation. Furthermore, existing graph-parallel systems provide limited fault-tolerance and support for interactive data mining.

We introduce GraphX, which combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework. We leverage new ideas in distributed graph representation to efficiently distribute graphs as tabular data-structures. Similarly, we leverage advances in data-flow systems to exploit in-memory computation and fault-tolerance. We provide powerful new operations to simplify graph construction and transformation. Using these primitives we implement the PowerGraph and Pregel abstractions in less than 20 lines of code. Finally, by exploiting the Scala foundation of Spark, we enable users to interactively load, transform, and compute on massive graphs.

References

  1. A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In IPDPS, 2006. 3.1.1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Albert, H. Jeong, and A. L. Barabási. Error and attack tolerance of complex networks. In Nature, volume 406, pages 378--482, 2000. 3.1.1Google ScholarGoogle Scholar
  3. A. Buluç and J. R. Gilbert. The combinatorial blas: design, implementation, and applications. IJHPCA, 25(4):496--509, 2011. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. U. V. Çatalyürek, C. Aykanat, and B. Uçar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput., 32(2):656--683, 2010. 3.1.1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Cheng et al. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys, 2012. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI '12. 1, 2, 3.1.1, 4, 4.1, 4.2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Lang. Finding good nearly balanced cuts in power law graphs. Technical Report YRL-2004-036, Yahoo! Research Labs, Pasadena, CA, Nov. 2004. 3.1.1Google ScholarGoogle Scholar
  8. J. Leskovec et al. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2008. 3.1.1, 5Google ScholarGoogle ScholarCross RefCross Ref
  9. G. Malewicz, M. H. Austern, A. J. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010. 1, 4, 4.1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999--66, Stanford InfoLab, 1999. 4Google ScholarGoogle Scholar
  11. P. Stutz, A. Bernstein, and W. Cohen. Signal/collect: graph algorithms for the (semantic) web. In ISWC, 2010. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Zaharia et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI, 2012. 1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. GraphX: a resilient distributed graph system on Spark

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GRADES '13: First International Workshop on Graph Data Management Experiences and Systems
      June 2013
      101 pages
      ISBN:9781450321884
      DOI:10.1145/2484425

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate29of61submissions,48%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader