ABSTRACT
We investigate applying general-purpose join algorithms to the triangle listing problem on heterogeneous systems that feature a multi-core CPU and multiple GPUs. In particular, we consider an out-of-core context where graph data are available on secondary storage such as a solid-state disk (SSD) and may not fit in the CPU main memory or GPU device memory. We focus on Leapfrog Triejoin (LFTJ), a recently proposed, worst-case optimal algorithm and present "boxing": a novel yet conceptually simple approach for partitioning and feeding out-of-core input data to LFTJ. The "boxing" algorithm integrates well with a GPU-Optimized LFTJ algorithm for triangle listing. We achieve significant performance gains on a heterogeneous system comprised of GPUs and CPU by utilizing the massive-parallel computation capability of GPUs. Our experimental evaluations on real-world and synthetic data sets (some of which containing more than 1.2 billion edges) show that out-of-core LFTJ is competitive with specialized graph algorithms for triangle listing. By using one or two GPUs, we achieve additional speedups on the same graphs.
- A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS'08, pages 739--748. IEEE, 2008. Google ScholarDigital Library
- S. Baxter. Modern gpu. http://nvlabs.github.io/moderngpu/, 2013.Google Scholar
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, volume 4, pages 442--446. SIAM, 2004.Google Scholar
- S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In SIGKDD, pages 672--680. ACM, 2011. Google ScholarDigital Library
- R. Dementiev. Algorithm engineering for large data sets. PhD thesis, Saarland University, 2006.Google Scholar
- G. Diamos, H. Wu, J. Wang, A. Lele, and S. Yalamanchili. Relational algorithms for multi-bulk-synchronous processors. PPoPP '13, pages 301--302, 2013. Google ScholarDigital Library
- X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In SIGMOD, pages 325--336. ACM, 2013. Google ScholarDigital Library
- M. A. Khamis, H. Q. Ngo, C. Ré, and A. Rudra. A resolution-based framework for joins: Worst-case and beyond. CoRR, abs/1404.0703, 2014.Google Scholar
- H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW '10, pages 591--600, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012. Google ScholarDigital Library
- B. Menegola. An external memory algorithm for listing triangles. Technical report, Universidade Federal do Rio Grande do Sul, 2010.Google Scholar
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In IMC'07, San Diego, CA, October 2007. Google ScholarDigital Library
- H. Q. Ngo, D. T. Nguyen, C. Ré, and A. Rudra. Beyond worst-case analysis for joins with minesweeper. CoRR, abs/1302.0914, 2014. Google ScholarDigital Library
- H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms:{extended abstract}. In PODS, pages 37--48. ACM, 2012. Google ScholarDigital Library
- R. Pagh and F. Silvestri. The input/output complexity of triangle enumeration. In PODS'14, pages 224--233, 2014. Google ScholarDigital Library
- G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814--818, 2005.Google ScholarCross Ref
- J. W. Raymond, E. J. Gardiner, and P. Willett. Rascal: Calculation of graph similarity using maximum common edge subgraphs. The Computer Journal, 45(6):631--644, 2002.Google ScholarCross Ref
- N. Rhodes, P. Willett, A. Calvet, J. B. Dunbar, and C. Humblet. Clip: similarity searching of 3d databases using clique detection. Journal of chemical information and computer sciences, 43(2):443--448, 2003.Google Scholar
- T. Schank. Algorithmic aspects of triangle-based network analysis. Phd in computer science, University Karlsruhe, 2007.Google Scholar
- J. Seo, J. Park, J. Shin, and M. S. Lam. Distributed socialite: a datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment, 6(14):1906--1917, 2013. Google ScholarDigital Library
- T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.Google Scholar
- H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red fox: An execution environment for relational query processing on gpus. CGO '14, pages 44:44--44:54, 2014. Google ScholarDigital Library
- H. Wu, G. Diamos, J. Wang, S. Cadambi, S. Yalamanchili, and S. Chakradhar. Optimizing data warehousing applications for gpus using kernel fusion/fission. In PLC, IPDPSW '12, pages 2433--2442, 2012. Google ScholarDigital Library
- H. Wu, D. Zinn, M. Aref, and S. Yalamanchili. Multipredicate join algorithms for accelerating relational graph processing on GPUs. ADMS 2014, September 2014.Google Scholar
- J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. CoRR, abs/1205.6233, 2012. Google ScholarDigital Library
- D. Zinn. General-purpose join algorithms for listing triangles in large graphs. CoRR, abs/1501.06689, 2015.Google Scholar
Index Terms
- General-purpose join algorithms for large graph triangle listing on heterogeneous systems
Recommendations
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingGraphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Comments