Abstract
Graph analytics on social networks, Web data, and communication networks has been widely used in a plethora of applications. Many graph analytics algorithms are based on breadth-first search (BFS) graph traversal, which is not only time-consuming for large datasets but also involves much redundant computation when executed multiple times from different start vertices. In this paper, we propose Multi-Source BFS (MS-BFS), an algorithm that is designed to run multiple concurrent BFSs over the same graph on a single CPU core while scaling up as the number of cores increases. MS-BFS leverages the properties of small-world networks, which apply to many real-world graphs, and enables efficient graph traversal that: (i) shares common computation across concurrent BFSs; (ii) greatly reduces the number of random memory accesses; and (iii) does not incur synchronization costs. We demonstrate how a real graph analytics application---all-vertices closeness centrality---can be efficiently solved with MS-BFS. Furthermore, we present an extensive experimental evaluation with both synthetic and real datasets, including Twitter and Wikipedia, showing that MS-BFS provides almost linear scalability with respect to the number of cores and excellent scalability for increasing graph sizes, outperforming state-of-the-art BFS algorithms by more than one order of magnitude when running a large number of BFSs.
- Graph 500 Benchmark, 2014. http://www.graph500.org/.Google Scholar
- V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable Graph Exploration on Multicore Processors. In SC '10, pages 1--11, 2010. Google ScholarDigital Library
- L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. Stanley. Classes of Small-World Networks. PNAS, 97(21): 11149--11152, 2000.Google Scholar
- L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four Degrees of Separation. In WebSci '12, pages 33--42, 2012. Google ScholarDigital Library
- D. A. Bader and K. Madduri. Designing Multithreaded Algorithms for Breadth-First Search and St-connectivity on the Cray MTA-2. In ICPP '06, pages 523--530, 2006. Google ScholarDigital Library
- D. A. Bader and K. Madduri. SNAP, Small-world Network Analysis and Partitioning: An Open-source Parallel Graph Framework for the Exploration of Large-scale Networks. In IPDPS, pages 1--12, 2008.Google Scholar
- S. Beamer, K. Asanović, and D. Patterson. Direction-Optimizing Breadth-First Search. In SC '12, pages 12:1--12:10, 2012. Google ScholarDigital Library
- P. Boncz. LDBC: Benchmarks for Graph and RDF Data Management. In IDEAS '13, pages 1--2, 2013. Google ScholarDigital Library
- U. Brandes. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology, 25: 163--177, 2001.Google ScholarCross Ref
- A. Buluç and K. Madduri. Parallel Breadth-First Search on Distributed Memory Systems. In SC '11, pages 65:1--65:12, 2011. Google ScholarDigital Library
- F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A. R. Choudhury, and Y. Sabharwal. Breaking the Speed and Scalability Barriers for Graph Exploration on Distributed-Memory Machines. In SC '12, pages 13:1--13:12, 2012. Google ScholarDigital Library
- J. Cheng, Z. Shang, H. Cheng, H. Wang, and J. X. Yu. K-Reach: Who is in Your Small World. PVLDB, 5(11): 1292--1303, July 2012. Google ScholarDigital Library
- B. V. Cherkassky, A. V. Goldberg, and T. Radzik. Shortest Paths Algorithms: Theory and Experimental Evaluation. Mathematical Programming, 73(2): 129--174, 1996. Google ScholarDigital Library
- J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency. In IPDPS '12, pages 378--389, May 2012. Google ScholarDigital Library
- Faunus -- Graph Analytics Engine, 2014. http://thinkaurelius.github.io/faunus/.Google Scholar
- D. Gregor and A. Lumsdaine. The Parallel BGL: A Generic Library for Distributed Graph Computations. In POOSC, 2005.Google Scholar
- P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh. WTF: The Who to Follow Service at Twitter. In WWW '13, pages 505--514, 2013. Google ScholarDigital Library
- S. Hong, T. Oguntebi, and K. Olukotun. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU. In PACT '11, pages 78--88, Oct 2011. Google ScholarDigital Library
- A. Kemper and T. Neumann. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE'11, pages 195--206, 2011. Google ScholarDigital Library
- A. Landherr, B. Friedl, and J. Heidemann. A Critical Review of Centrality Measures in Social Networks. Business & Information Systems Engineering, 2(6): 371--385, 2010.Google ScholarCross Ref
- LDBC Social Network Data Generator, 2014. https://github.com/ldbc/ldbc_snb_datagen.Google Scholar
- J. Lee, C. Jung, D. Lim, and Y. Solihin. Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems. TPDS, 20(9): 1309--1324, 2009. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB, 5(8): 716--727, April 2012. Google ScholarDigital Library
- A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in Parallel Graph Processing. Parallel Processing Letters, 17(1): 5--20, 2007.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD '10, pages 135--146, 2010. Google ScholarDigital Library
- Neo4j, 2014. http://neo4j.com/.Google Scholar
- P. Olsen, A. Labouseur, and J.-H. Hwang. Efficient Top-K Closeness Centrality Search. In ICDE '14, pages 196--207, March 2014.Google Scholar
- L. Qin, J. X. Yu, L. Chang, H. Cheng, C. Zhang, and X. Lin. Scalable Big Graph Processing in MapReduce. In SIGMOD '14, pages 827--838, 2014. Google ScholarDigital Library
- A. Rowstron, D. Narayanan, A. Donnelly, G. O'Shea, and A. Douglas. Nobody Ever Got Fired for Using Hadoop on a Cluster. In HotCDP '12, pages 2:1--2:5, 2012. Google ScholarDigital Library
- N. Satish, C. Kim, J. Chhugani, and P. Dubey. Large-Scale Energy-efficient Graph Traversal: A Path to Efficient Data-Intensive Supercomputing. In SC '12, pages 14:1--14:11, 2012. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD '13, pages 505--516, 2013. Google ScholarDigital Library
- D. Simmen, K. Schnaitter, J. Davis, Y. He, S. Lohariwala, A. Mysore, V. Shenoi, M. Tan, and X. Yu. Large-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery. PVLDB, 7(13): 1405--1416, August 2014. Google ScholarDigital Library
- Titan -- Distributed Graph Database, 2014. http://thinkaurelius.github.io/titan/.Google Scholar
- S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications, volume 8. Cambridge University Press, 1994.Google ScholarCross Ref
Index Terms
- The more the merrier: efficient multi-source graph traversal
Recommendations
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes
An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs
A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Comments