research-article

Accelerating dynamic graph analytics on GPUs

Authors:
Mo Sha

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Yuchen Li

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Bingsheng He

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Kian-Lee Tan

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

Proceedings of the VLDB Endowment Volume 11 Issue 1pp 107–120https://doi.org/10.14778/3151113.3151122

Published:01 September 2017Publication History

Proceedings of the VLDB Endowment

Abstract

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.

References

Apache flink. https://flink.apache.org/. Accessed: 2016-10-18.Google Scholar
Cusp library. https://developer.nvidia.com/cusp. Accessed: 2017-03-25.Google Scholar
cusparse. https://developer.nvidia.com/cusparse. Accessed: 2016-11-09.Google Scholar
CUDA UnBound (CUB) library. https://nvlabs.github.io/cub/, 2015.Google Scholar
L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov., 29(3):626--688, 2015. Google ScholarDigital Library
A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In SC, pages 781--792, 2014. Google ScholarDigital Library
A. Ashari, N. Sedaghati, J. Eisenlohr, and P. Sadayappan. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on gpus. In ICS, pages 273--282, 2014. Google ScholarDigital Library
D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarría-Miranda, C. Hastings, K. Madduri, and S. C. Poulos. Stinger: Spatio-temporal interaction networks and graphs (sting) extensible representation. Georgia Institute of Technology, Tech. Rep, 2009.Google Scholar
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. Technical Report NVR-2008--004, NVIDIA Corporation, 2008.Google Scholar
M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. SIAM J. Comput., 35(2):341--358, 2005. Google ScholarDigital Library
M. A. Bender and H. Hu. An adaptive packed-memory array. ACM Trans. Database Syst., 32(4), 2007. Google ScholarDigital Library
L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, pages 251--264, 2015. Google ScholarDigital Library
F. Busato and N. Bombieri. Bfs-4k: an efficient implementation of bfs for kepler gpu architectures. TPDS, 26(7):1826--1838, 2015.Google ScholarCross Ref
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012. Google ScholarDigital Library
M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348. Springer, 2013.Google ScholarCross Ref
H.-V. Dang and B. Schmidt. The sliced coo format for sparse matrix-vector multiplication on cuda-enabled gpus. Procedia Computer Science, 9:57--66, 2012.Google ScholarCross Ref
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794--1813, 2002. Google ScholarDigital Library
A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel gpu methods for single-source shortest paths. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 349--359. IEEE, 2014. Google ScholarDigital Library
D. Ediger, R. McColl, E. J. Riedy, and D. A. Bader. STINGER - High performance data structure for streaming graphs. HPEC, 2012.Google ScholarCross Ref
M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Trans. Algorithms, 7(2):20:1--20:17, 2011. Google ScholarDigital Library
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2--3):207--216, 2005. Google ScholarDigital Library
Z. Fu, M. Personick, and B. Thompson. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. A High Level API for Fast Development of High Performance Graph Analytics on GPUs. ACM, New York, New York, USA, June 2014.Google Scholar
S. Guha and A. McGregor. Graph synopses, sketches, and streams: A survey. PVLDB, 5(12):2030--2031, 2012. Google ScholarDigital Library
W. Guo, Y. Li, M. Sha, and K.-L. Tan. Parallel personalized pagerank on dynamic graphs. PVLDB, 11(1), 2017. Google ScholarDigital Library
P. Harish and P. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In International Conference on High-Performance Computing, pages 197--208. Springer, 2007. Google ScholarDigital Library
D. S. Hirschberg. Parallel algorithms for the transitive closure and the connected component problems. In Proceedings of the eighth annual ACM symposium on Theory of computing, pages 55--57. ACM, 1976. Google ScholarDigital Library
A. P. Iyer, L. E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016. Google ScholarDigital Library
A. P. Iyer, L. E. Li, and I. Stoica. Celliq : Real-time cellular network analytics at scale. In NSDI, pages 309--322, 2015. Google ScholarDigital Library
R. Kaleem, A. Venkat, S. Pai, M. Hall, and K. Pingali. Synchronization trade-offs in gpu implementations of graph algorithms. In Parallel and Distributed Processing Symposium, 2016 IEEE International, pages 514--523. IEEE, 2016.Google ScholarCross Ref
J. King, T. Gilray, R. M. Kirby, and M. Might. Dynamic sparse-matrix allocation on gpus. In ISC, pages 61--80, 2016.Google ScholarCross Ref
J. Leskovec and R. Sosič. Snap: A general-purpose network analysis and graph-mining library. TIST, 8(1):1, 2016. Google ScholarDigital Library
X. Lin, R. Zhang, Z. Wen, H. Wang, and J. Qi. Efficient subgraph matching using gpus. In ADC, pages 74--85, 2014.Google ScholarCross Ref
H. Liu, H. H. Huang, and Y. Hu. ibfs: Concurrent breadth-first search on gpus. In SIGMOD, pages 403--416, 2016. Google ScholarDigital Library
L. Luo, M. Wong, and W.-m. Hwu. An effective gpu implementation of breadth-first search. In DAC, pages 52--55, 2010. Google ScholarDigital Library
M. Martone, S. Filippone, S. Tucci, P. Gepner, and M. Paprzycki. Use of hybrid recursive csr/coo data structures in sparse matrix-vector multiplication. In IMCSIT, pages 327--335. IEEE, 2010.Google ScholarCross Ref
A. McGregor. Graph stream algorithms: A survey. SIGMOD Rec., 43(1):9--20, 2014. Google ScholarDigital Library
D. Merrill, M. Garland, and A. Grimshaw. High-Performance and Scalable GPU Graph Traversal. TOPC, 1(2), 2015. Google ScholarDigital Library
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. 2010.Google Scholar
N. Ohsaka, T. Maehara, and K.-i. Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015. Google ScholarDigital Library
Y. Saad. Numerical solution of large nonsymmetric eigenvalue problems. Computer Physics Communications, 53(1):71--90, 1989.Google ScholarCross Ref
D. Sayce. 10 billions tweets, number of tweets per day. http://www.dsayce.com/social-media/10-billions-tweets/. Accessed: 2016-10-18.Google Scholar
M. Sha, Y. Li, B. He, and K.-L. Tan. Technical report: Accelerating dynamic graph analytics on gpus. arXiv preprint arXiv:1709.05061, 2017.Google Scholar
J. Soman, K. Kothapalli, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. IPDPS Workshops, 2010.Google ScholarCross Ref
M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4):42--47, 2005. Google ScholarDigital Library
J. A. Stratton, N. Anssari, C. Rodrigues, I.-J. Sung, N. Obeid, L. Chang, G. D. Liu, and W.-m. Hwu. Optimization and architecture effects on gpu computing workload performance. In InPar, pages 1--10, 2012.Google ScholarCross Ref
N. Tang, Q. Chen, and P. Mitra. Graph stream summarization: From big bang to big crunch. In SIGMOD, pages 1481--1496, 2016. Google ScholarDigital Library
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, pages 147--156, 2014. Google ScholarDigital Library
C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. DOULION: counting triangles in massive graphs with a coin. In SIGKDD, pages 837--846, 2009. Google ScholarDigital Library
U. Verner, A. Schuster, M. Silberstein, and A. Mendelson. Scheduling processing of real-time data streams on heterogeneous multi-gpu systems. In SYSTOR, page 7, 2012. Google ScholarDigital Library
Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: A high-performance graph processing library on the gpu. SIGPLAN Not., 50(8):265--266, 2015. Google ScholarDigital Library
Y. Wang, Q. Fan, Y. Li, and K.-L. Tan. Real-time influence maximization on dynamic social streams. PVLDB, 10(7):805--816, 2017. Google ScholarDigital Library
S. Yan, C. Li, Y. Zhang, and H. Zhou. yaspmv: yet another spmv framework on gpus. In SIGPLAN Notices, volume 49, pages 107--118, 2014. Google ScholarDigital Library
X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: Implications for graph mining. PVLDB, 4(4):231--242, 2011. Google ScholarDigital Library
Y. Yang, Z. Wang, J. Pei, and E. Chen. Tracking influential nodes in dynamic networks. arXiv preprint arXiv:1602.04490, 2016.Google Scholar
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP, pages 423--438, 2013. Google ScholarDigital Library
H. Zhang, P. Lofgren, and A. Goel. Approximate personalized pagerank on dynamic graphs. arXiv preprint arXiv:1603.07796, 2016.Google Scholar
Y. Zhang and F. Mueller. Gstream: A general-purpose data streaming framework on GPU clusters. In ICPP, pages 245--254, 2011. Google ScholarDigital Library
J. Zhong and B. He. Medusa: Simplified graph processing on gpus. IEEE Trans. Parallel Distrib. Syst., 25(6):1543--1552, 2014. Google ScholarDigital Library

Recommendations

Accelerating CUDA graph algorithms at maximum warp
PPoPP '11

Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Read More
Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC
Parallel and Distributed Computing, Applications and Technologies
Abstract
To accelerate multiphysics applications, making use of not only GPUs but also FPGAs has been emerging. Multiphysics applications are simulations involving multiple physical models and multiple simultaneous physical phenomena. Operations with ...
Read More
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 11, Issue 1
Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, Brazil
September 2017
120 pages
ISSN:2150-8097
Editors:
Jian Pei
Simon Fraser University
,
Sihem Amer-Yahia
University of Grenoble Alpes, CNRS
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2017
Published in pvldb Volume 11, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 340
  Total Downloads
- Downloads (Last 12 months)89
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating dynamic graph analytics on GPUs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Accelerating CUDA graph algorithms at maximum warp

Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC

Accelerating CUDA graph algorithms at maximum warp

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accelerating dynamic graph analytics on GPUs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Accelerating CUDA graph algorithms at maximum warp

Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC

Accelerating CUDA graph algorithms at maximum warp

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media