skip to main content
article

Fast iterative graph computation with block updates

Published:01 September 2013Publication History
Skip Abstract Section

Abstract

Scaling iterative graph processing applications to large graphs is an important problem. Performance is critical, as data scientists need to execute graph programs many times with varying parameters. The need for a high-level, high-performance programming model has inspired much research on graph programming frameworks.

In this paper, we show that the important class of computationally light graph applications - applications that perform little computation per vertex - has severe scalability problems across multiple cores as these applications hit an early "memory wall" that limits their speedup. We propose a novel block-oriented computation model, in which computation is iterated locally over blocks of highly connected nodes, significantly improving the amount of computation per cache miss. Following this model, we describe the design and implementation of a block-aware graph processing runtime that keeps the familiar vertex-centric programming paradigm while reaping the benefits of block-oriented execution. Our experiments show that block-oriented execution significantly improves the performance of our framework for several graph applications.

References

  1. R. Andersen, D. F. Gleich, and V. S. Mirrokni. Overlapping clusters for distributed computation. In WSDM, 2012. Google ScholarGoogle Scholar
  2. E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. Google ScholarGoogle Scholar
  3. S. T. Barnard. PMRSB: Parallel multilevel recursive spectral bisection. In SC, page 27, 1995. Google ScholarGoogle Scholar
  4. R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.Google ScholarGoogle Scholar
  5. H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O'Neil, and P. E. O'Neil. A critique of ansi sql isolation levels. In SIGMOD, 1995. Google ScholarGoogle Scholar
  6. D. Bertsekas. Nonlinear Programming. Athena Scientific, 1995.Google ScholarGoogle Scholar
  7. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7):107-117, 1998. Google ScholarGoogle Scholar
  8. A. Chacon and A. Vladimirsky. Fast two-scale methods for eikonal equations. SIAM J. Scientific Computing, 34(2), 2012. Google ScholarGoogle Scholar
  9. D. Crandall, A. Owens, N. Snavely, and D. P. Huttenlocher. Discrete-Continuous optimization for large-scale structure from motion. In CVPR, pages 3001-3008, 2011. Google ScholarGoogle Scholar
  10. J. W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997. Google ScholarGoogle Scholar
  11. G. H. Golub and C. F. Van Loan. Matrix computations (3. ed.). Johns Hopkins University Press, 1996. Google ScholarGoogle Scholar
  12. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012. Google ScholarGoogle Scholar
  13. M. Griebel and P. Oswald. Greedy and randomized versions of the multiplicative Schwarz method. Linear Algebra Appl., 437(7):1596-1610, 2012.Google ScholarGoogle Scholar
  14. W. D. Gropp and D. E. Keyes. Domain decomposition on parallel computers. Impact Comput. Sci. Eng., 1:421-439, 1989. Google ScholarGoogle Scholar
  15. A. Gupta, H. V. Jagadish, and I. S. Mumick. Data integration using self-maintainable views. In EDBT, 1996. Google ScholarGoogle Scholar
  16. E.-J. Im, K. Yelick, and R. Vuduc. SPARSITY: An optimizing framework for sparse matrix kernels. International Journal of High Performance Computing Applications, 18:135-158, 2004. Google ScholarGoogle Scholar
  17. G. Jeh and J. Widom. Scaling personalized web search. In WWW, 2003. Google ScholarGoogle Scholar
  18. K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in MapReduce. In CLUSTER, 2010. Google ScholarGoogle Scholar
  19. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system. In ICDM, 2009. Google ScholarGoogle Scholar
  20. G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: Application in VLSI domain. In DAC, 1997. Google ScholarGoogle Scholar
  21. J. Kleinberg and E. Tardos. Algorithm Design. Addison-Wesley Longman Publishing Co., Inc., 2005. Google ScholarGoogle Scholar
  22. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI, pages 211-222, 2007. Google ScholarGoogle Scholar
  23. A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In OSDI, 2012. Google ScholarGoogle Scholar
  24. C.-K. Liang, C.-C. Cheng, Y.-C. Lai, L.-G. Chen, and H. H. Chen. Hardware-efficient belief propagation. In CVPR, 2009.Google ScholarGoogle Scholar
  25. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new framework for parallel machine learning. In UAI, 2010.Google ScholarGoogle Scholar
  26. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB, 5(8), 2012. Google ScholarGoogle Scholar
  27. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010. Google ScholarGoogle Scholar
  28. F. McSherry, D. Murray, R. Isaacs, and M. Isard. Differential dataflow. In CIDR, 2013.Google ScholarGoogle Scholar
  29. R. Nishtala, R. Vuduc, J. Demmel, and K. Yelick. When cache blocking sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communications, and Computing, 18:297-311, 2007. Google ScholarGoogle Scholar
  30. J.-S. Park, M. Penner, and V. Prasanna. Optimizing graph algorithms for improved cache performance. In IPDPS, 2002. Google ScholarGoogle Scholar
  31. D. Patterson and J. Hennessey. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, second edition, 1996.Google ScholarGoogle Scholar
  32. U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76:036106, 2002.Google ScholarGoogle Scholar
  33. J. A. Sethian. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press, 1999.Google ScholarGoogle Scholar
  34. B. Smith, P. Bjørstad, and W. Gropp. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Difference Equations. Cambridge University Press, 1996. Google ScholarGoogle Scholar
  35. D. Spielman and S.-H. Teng. A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning. SIAM J. Comput., 42(1):1-26, 2013.Google ScholarGoogle Scholar
  36. I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In KDD, pages 1222-1230, 2012. Google ScholarGoogle Scholar
  37. P. Stutz, A. Bernstein, and W. W. Cohen. Signal/Collect: Graph algorithms for the (semantic) web. In ISWC, 2010. Google ScholarGoogle Scholar
  38. L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8), 1990. Google ScholarGoogle Scholar
  39. R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, 2005.Google ScholarGoogle Scholar
  40. G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous large-scale processing made easy. In CIDR, 2013.Google ScholarGoogle Scholar
  41. D. Wingate, N. Powell, Q. Snell, and K. Seppi. Prioritized multiplicative Schwarz procedures for solving linear systems. In IPDPS, 2005. Google ScholarGoogle Scholar
  42. Y. Zhang, Q. Gao, L. Gao, and C. Wang. PrIter: A distributed framework for prioritized iterative computations. In SOCC, 2011. Google ScholarGoogle Scholar
  43. H. Zhao. Parallel implementation of fast sweeping method. Journal of Computational Mathematics, 25(4):421-429, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Fast iterative graph computation with block updates
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 6, Issue 14
          September 2013
          384 pages

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 September 2013
          Published in pvldb Volume 6, Issue 14

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader