Abstract
Scaling iterative graph processing applications to large graphs is an important problem. Performance is critical, as data scientists need to execute graph programs many times with varying parameters. The need for a high-level, high-performance programming model has inspired much research on graph programming frameworks.
In this paper, we show that the important class of computationally light graph applications - applications that perform little computation per vertex - has severe scalability problems across multiple cores as these applications hit an early "memory wall" that limits their speedup. We propose a novel block-oriented computation model, in which computation is iterated locally over blocks of highly connected nodes, significantly improving the amount of computation per cache miss. Following this model, we describe the design and implementation of a block-aware graph processing runtime that keeps the familiar vertex-centric programming paradigm while reaping the benefits of block-oriented execution. Our experiments show that block-oriented execution significantly improves the performance of our framework for several graph applications.
- R. Andersen, D. F. Gleich, and V. S. Mirrokni. Overlapping clusters for distributed computation. In WSDM, 2012. Google Scholar
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. Google Scholar
- S. T. Barnard. PMRSB: Parallel multilevel recursive spectral bisection. In SC, page 27, 1995. Google Scholar
- R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.Google Scholar
- H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O'Neil, and P. E. O'Neil. A critique of ansi sql isolation levels. In SIGMOD, 1995. Google Scholar
- D. Bertsekas. Nonlinear Programming. Athena Scientific, 1995.Google Scholar
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7):107-117, 1998. Google Scholar
- A. Chacon and A. Vladimirsky. Fast two-scale methods for eikonal equations. SIAM J. Scientific Computing, 34(2), 2012. Google Scholar
- D. Crandall, A. Owens, N. Snavely, and D. P. Huttenlocher. Discrete-Continuous optimization for large-scale structure from motion. In CVPR, pages 3001-3008, 2011. Google Scholar
- J. W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997. Google Scholar
- G. H. Golub and C. F. Van Loan. Matrix computations (3. ed.). Johns Hopkins University Press, 1996. Google Scholar
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012. Google Scholar
- M. Griebel and P. Oswald. Greedy and randomized versions of the multiplicative Schwarz method. Linear Algebra Appl., 437(7):1596-1610, 2012.Google Scholar
- W. D. Gropp and D. E. Keyes. Domain decomposition on parallel computers. Impact Comput. Sci. Eng., 1:421-439, 1989. Google Scholar
- A. Gupta, H. V. Jagadish, and I. S. Mumick. Data integration using self-maintainable views. In EDBT, 1996. Google Scholar
- E.-J. Im, K. Yelick, and R. Vuduc. SPARSITY: An optimizing framework for sparse matrix kernels. International Journal of High Performance Computing Applications, 18:135-158, 2004. Google Scholar
- G. Jeh and J. Widom. Scaling personalized web search. In WWW, 2003. Google Scholar
- K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in MapReduce. In CLUSTER, 2010. Google Scholar
- U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system. In ICDM, 2009. Google Scholar
- G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: Application in VLSI domain. In DAC, 1997. Google Scholar
- J. Kleinberg and E. Tardos. Algorithm Design. Addison-Wesley Longman Publishing Co., Inc., 2005. Google Scholar
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI, pages 211-222, 2007. Google Scholar
- A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In OSDI, 2012. Google Scholar
- C.-K. Liang, C.-C. Cheng, Y.-C. Lai, L.-G. Chen, and H. H. Chen. Hardware-efficient belief propagation. In CVPR, 2009.Google Scholar
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new framework for parallel machine learning. In UAI, 2010.Google Scholar
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB, 5(8), 2012. Google Scholar
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010. Google Scholar
- F. McSherry, D. Murray, R. Isaacs, and M. Isard. Differential dataflow. In CIDR, 2013.Google Scholar
- R. Nishtala, R. Vuduc, J. Demmel, and K. Yelick. When cache blocking sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communications, and Computing, 18:297-311, 2007. Google Scholar
- J.-S. Park, M. Penner, and V. Prasanna. Optimizing graph algorithms for improved cache performance. In IPDPS, 2002. Google Scholar
- D. Patterson and J. Hennessey. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, second edition, 1996.Google Scholar
- U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76:036106, 2002.Google Scholar
- J. A. Sethian. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press, 1999.Google Scholar
- B. Smith, P. Bjørstad, and W. Gropp. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Difference Equations. Cambridge University Press, 1996. Google Scholar
- D. Spielman and S.-H. Teng. A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning. SIAM J. Comput., 42(1):1-26, 2013.Google Scholar
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In KDD, pages 1222-1230, 2012. Google Scholar
- P. Stutz, A. Bernstein, and W. W. Cohen. Signal/Collect: Graph algorithms for the (semantic) web. In ISWC, 2010. Google Scholar
- L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8), 1990. Google Scholar
- R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, 2005.Google Scholar
- G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous large-scale processing made easy. In CIDR, 2013.Google Scholar
- D. Wingate, N. Powell, Q. Snell, and K. Seppi. Prioritized multiplicative Schwarz procedures for solving linear systems. In IPDPS, 2005. Google Scholar
- Y. Zhang, Q. Gao, L. Gao, and C. Wang. PrIter: A distributed framework for prioritized iterative computations. In SOCC, 2011. Google Scholar
- H. Zhao. Parallel implementation of fast sweeping method. Journal of Computational Mathematics, 25(4):421-429, 2007.Google Scholar
Index Terms
- Fast iterative graph computation with block updates
Recommendations
Fast iterative graph computation: a path centric approach
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisLarge scale graph processing represents an interesting challenge due to the lack of locality. This paper presents PathGraph for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, ...
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstractions
HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed ComputingIterative computation on large graphs has challenged system research from two aspects: (1) how to conduct high performance parallel processing for both in-memory and out-of-core graphs; and (2) how to handle large graphs that exceed the resource ...
Block-graph width
AbstractLet G be a class of graphs. A graph G has G-width k if there are k independent sets N1,…,Nk in G such that G can be embedded into a graph H∈G such that for every edge e in H which is not an edge in G, there exists an i such that both endvertices ...
Comments