Abstract
Truss decomposition is to divide a graph into a hierarchy of subgraphs, or trusses. A subgraph is a k-truss (k ≥ 2) if each edge is in at least k --- 2 triangles in the subgraph. Existing algorithms work by first counting the number of triangles each edge is in and then iteratively incrementing k to peel off the edges that will not appear in (k + 1)-truss. Due to the data and computation intensity, truss decomposition on billion-edge graphs takes hours to complete on a commodity computer.
We propose to accelerate in-memory truss decomposition by (1) compacting intermediate results to optimize memory access, (2) dynamically adjusting the computation based on data characteristics, and (3) parallelizing the algorithm on both the multicore CPU and the GPU. In particular, we optimize the triangle enumeration with data skew handling, and determine at runtime whether to pursue peeling or direct triangle counting to obtain a certain k-truss. We further develop a CPU-GPU co-processing strategy in which the CPU first computes intermediate results and sends the compacted results to the GPU for further computation. Our experiments on real-world datasets show that our implementations outperform the state of the art by up to an order of magnitude. Our source code is publicly available at https://github.com/RapidsAtHKUST/AccTrussDecomposition.
- Cub documentation. http://nvlabs.github.io/cub/. Accessed in 2020.Google Scholar
- Gcc atomic built-ins. https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html. Accessed in 2020.Google Scholar
- Openmp documentation. https://www.openmp.org/. Accessed in 2020.Google Scholar
- E. Akbas and P. Zhao. Truss-based community search: a truss-equivalence based indexing approach. PVLDB, 10(11):1298--1309, 2017. Google ScholarDigital Library
- M. Almasri, O. Anjum, C. Pearson, Z. Qureshi, V. S. Mailthody, R. Nagi, J. Xiong, and W.-m. Hwu. Update on k-truss decomposition on gpu. In HPEC, pages 1--7. IEEE, 2019.Google ScholarCross Ref
- P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, pages 587--596, 2011. Google ScholarDigital Library
- P. Boldi and S. Vigna. The webgraph framework i: compression techniques. In WWW, pages 595--602, 2004. Google ScholarDigital Library
- L. Chang and L. Qin. Minimum degree-based core decomposition. In Cohesive Subgraph Computation over Large Sparse Graphs, pages 21--39. Springer, 2018.Google ScholarCross Ref
- Y. Che. Rmat graph format converter code repository. https://github.com/RapidsAtHKUST/KroneckerBinEdgeListToCSR. Accessed in 2020.Google Scholar
- Y. Che. Rmat graph generator code repository. https://github.com/RapidsAtHKUST/Graph500KroneckerGraphGenerator. Accessed in 2020.Google Scholar
- Y. Che, Z. Lai, S. Sun, Q. Luo, and Y. Wang. Accelerating all-edge common neighbor counting on three processors. In ICPP, pages 1--10, 2019. Google ScholarDigital Library
- Y. Che, Z. Lai, S. Sun, Y. Wang, and Q. Luo. Source code of accelerating truss decomposition on heterogeneous processors. https://github.com/RapidsAtHKUST/AccTrussDecomposition. Accessed in 2020.Google Scholar
- N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on computing, 14(1):210--223, 1985. Google ScholarDigital Library
- J. Cohen. Trusses: Cohesive subgraphs for social network analysis. National security agency technical report, 16:3--1, 2008.Google Scholar
- K. Date, K. Feng, R. Nagi, J. Xiong, N. S. Kim, and W.-M. Hwu. Collaborative (cpu+ gpu) algorithms for triangle counting and truss decomposition on the minsky architecture: Static graph challenge: Subgraph isomorphism. In HPEC, pages 1--7. IEEE, 2017.Google ScholarCross Ref
- L. Dhulipala, G. Blelloch, and J. Shun. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In SPAA, pages 293--304, 2017. Google ScholarDigital Library
- C. Gui, L. Zheng, P. Yao, X. Liao, and H. Jin. Fast triangle counting on GPU. In HPEC, pages 1--7. IEEE, 2019. Google ScholarDigital Library
- Y. Hu, H. Liu, and H. H. Huang. High-performance triangle counting on gpus. In HPEC, pages 1--5. IEEE, 2018.Google ScholarCross Ref
- X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, pages 1311--1322, 2014. Google ScholarDigital Library
- H. Kabir and K. Madduri. Parallel k-truss decomposition on multicore systems. In HPEC, pages 1--7. IEEE, 2017.Google ScholarCross Ref
- H. Kabir and K. Madduri. Shared-memory graph truss decomposition. In HiPC, pages 13--22. IEEE, 2017.Google ScholarCross Ref
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014. Accessed in 2020.Google Scholar
- L. Lü, T. Zhou, Q.-M. Zhang, and H. E. Stanley. The h-index of a network node and its relation to degree and coreness. Nature communications, 7:10168, 2016.Google ScholarCross Ref
- V. S. Mailthody, K. Date, Z. Qureshi, C. Pearson, R. Nagi, J. Xiong, and W.-m. Hwu. Collaborative (cpu+ gpu) algorithms for triangle counting and truss decomposition. In HPEC, pages 1--7. IEEE, 2018.Google Scholar
- A. Montresor, F. De Pellegrini, and D. Miorandi. Distributed k-core decomposition. TPDS, 24(2):288--300, 2012. Google ScholarDigital Library
- S. Pandey, X. S. Li, A. Buluç, J. Xu, and H. Liu. H-INDEX: hash-indexing for parallel triangle counting on gpus. In HPEC, pages 1--7. IEEE, 2019.Google ScholarCross Ref
- R. Pearce. Triangle counting for scale-free graphs at scale in distributed memory. In HPEC, pages 1--4. IEEE, 2017.Google ScholarCross Ref
- R. A. Rossi. Fast triangle core decomposition for mining large graphs. In PAKDD, pages 310--322. Springer, 2014.Google ScholarCross Ref
- A. E. Sariyüce and A. Pinar. Fast hierarchy construction for dense subgraphs. PVLDB, 10(3):97--108, 2016. Google ScholarDigital Library
- A. E. Sariyüce, C. Seshadhri, and A. Pinar. Local algorithms for hierarchical dense subgraph discovery. PVLDB, 12(1):43--56, 2018. Google ScholarDigital Library
- A. E. Sariyuce, C. Seshadhri, A. Pinar, and U. V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In WWW, pages 927--937, 2015. Google ScholarDigital Library
- J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013. Google ScholarDigital Library
- J. Singler, P. Sanders, and F. Putze. Mcstl: The multi-core standard template library. In European Conference on Parallel Processing, pages 682--694. Springer, 2007. Google ScholarDigital Library
- S. Smith, X. Liu, N. K. Ahmed, A. S. Tom, F. Petrini, and G. Karypis. Truss decomposition on shared-memory parallel systems. In HPEC, pages 1--6. IEEE, 2017.Google ScholarCross Ref
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarDigital Library
- A. S. Tom, N. Sundaram, N. K. Ahmed, S. Smith, S. Eyerman, M. Kodiyath, I. Hur, F. Petrini, and G. Karypis. Exploring optimizations on shared-memory platforms for parallel triangle counting algorithms. In HPEC, pages 1--7. IEEE, 2017.Google ScholarCross Ref
- C. Voegele, Y.-S. Lu, S. Pai, and K. Pingali. Parallel triangle counting and k-truss identification using graph-centric methods. In HPEC, pages 1--7. IEEE, 2017.Google ScholarCross Ref
- J. Wang and J. Cheng. Truss decomposition in massive networks. PVLDB, 5(9):812--823, 2012. Google ScholarDigital Library
- J. Wu, A. Goshulak, V. Srinivasan, and A. Thomo. K-truss decomposition of large networks on a single consumer-grade machine. In ASONAM, pages 873--880. IEEE, 2018. Google ScholarDigital Library
- J. Zhang, D. G. Spampinato, S. McMillan, and F. Franchetti. Preliminary exploration of large-scale triangle counting on shared-memory multicore system. In HPEC, pages 1--6. IEEE, 2018.Google ScholarCross Ref
- Y. Zhang and J. X. Yu. Unboundedness and efficiency of truss maintenance in evolving graphs. In SIGMOD, pages 1024--1041, 2019. Google ScholarDigital Library
Recommendations
Accelerating DynEarthSol3D on tightly coupled CPU-GPU heterogeneous processors
DynEarthSol3D (Dynamic Earth Solver in Three Dimensions) is a flexible, open-source finite element solver that models the momentum balance and the heat transfer of elasto-visco-plastic material in the Lagrangian form using unstructured meshes. It ...
Accelerating simulation of agent-based models on heterogeneous architectures
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing UnitsThe wide usage of GPGPU programming models and compiler techniques enables the optimization of data-parallel programs on commodity GPUs. However, mapping GPGPU applications running on discrete parts to emerging integrated heterogeneous architectures ...
Comments