Abstract
As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.
- Apache flink. https://flink.apache.org/. Accessed: 2016-10-18.Google Scholar
- Cusp library. https://developer.nvidia.com/cusp. Accessed: 2017-03-25.Google Scholar
- cusparse. https://developer.nvidia.com/cusparse. Accessed: 2016-11-09.Google Scholar
- CUDA UnBound (CUB) library. https://nvlabs.github.io/cub/, 2015.Google Scholar
- L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov., 29(3):626--688, 2015. Google ScholarDigital Library
- A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In SC, pages 781--792, 2014. Google ScholarDigital Library
- A. Ashari, N. Sedaghati, J. Eisenlohr, and P. Sadayappan. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on gpus. In ICS, pages 273--282, 2014. Google ScholarDigital Library
- D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarría-Miranda, C. Hastings, K. Madduri, and S. C. Poulos. Stinger: Spatio-temporal interaction networks and graphs (sting) extensible representation. Georgia Institute of Technology, Tech. Rep, 2009.Google Scholar
- N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. Technical Report NVR-2008--004, NVIDIA Corporation, 2008.Google Scholar
- M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. SIAM J. Comput., 35(2):341--358, 2005. Google ScholarDigital Library
- M. A. Bender and H. Hu. An adaptive packed-memory array. ACM Trans. Database Syst., 32(4), 2007. Google ScholarDigital Library
- L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, pages 251--264, 2015. Google ScholarDigital Library
- F. Busato and N. Bombieri. Bfs-4k: an efficient implementation of bfs for kepler gpu architectures. TPDS, 26(7):1826--1838, 2015.Google ScholarCross Ref
- R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012. Google ScholarDigital Library
- M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348. Springer, 2013.Google ScholarCross Ref
- H.-V. Dang and B. Schmidt. The sliced coo format for sparse matrix-vector multiplication on cuda-enabled gpus. Procedia Computer Science, 9:57--66, 2012.Google ScholarCross Ref
- M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794--1813, 2002. Google ScholarDigital Library
- A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel gpu methods for single-source shortest paths. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 349--359. IEEE, 2014. Google ScholarDigital Library
- D. Ediger, R. McColl, E. J. Riedy, and D. A. Bader. STINGER - High performance data structure for streaming graphs. HPEC, 2012.Google ScholarCross Ref
- M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Trans. Algorithms, 7(2):20:1--20:17, 2011. Google ScholarDigital Library
- J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2--3):207--216, 2005. Google ScholarDigital Library
- Z. Fu, M. Personick, and B. Thompson. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. A High Level API for Fast Development of High Performance Graph Analytics on GPUs. ACM, New York, New York, USA, June 2014.Google Scholar
- S. Guha and A. McGregor. Graph synopses, sketches, and streams: A survey. PVLDB, 5(12):2030--2031, 2012. Google ScholarDigital Library
- W. Guo, Y. Li, M. Sha, and K.-L. Tan. Parallel personalized pagerank on dynamic graphs. PVLDB, 11(1), 2017. Google ScholarDigital Library
- P. Harish and P. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In International Conference on High-Performance Computing, pages 197--208. Springer, 2007. Google ScholarDigital Library
- D. S. Hirschberg. Parallel algorithms for the transitive closure and the connected component problems. In Proceedings of the eighth annual ACM symposium on Theory of computing, pages 55--57. ACM, 1976. Google ScholarDigital Library
- A. P. Iyer, L. E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016. Google ScholarDigital Library
- A. P. Iyer, L. E. Li, and I. Stoica. Celliq : Real-time cellular network analytics at scale. In NSDI, pages 309--322, 2015. Google ScholarDigital Library
- R. Kaleem, A. Venkat, S. Pai, M. Hall, and K. Pingali. Synchronization trade-offs in gpu implementations of graph algorithms. In Parallel and Distributed Processing Symposium, 2016 IEEE International, pages 514--523. IEEE, 2016.Google ScholarCross Ref
- J. King, T. Gilray, R. M. Kirby, and M. Might. Dynamic sparse-matrix allocation on gpus. In ISC, pages 61--80, 2016.Google ScholarCross Ref
- J. Leskovec and R. Sosič. Snap: A general-purpose network analysis and graph-mining library. TIST, 8(1):1, 2016. Google ScholarDigital Library
- X. Lin, R. Zhang, Z. Wen, H. Wang, and J. Qi. Efficient subgraph matching using gpus. In ADC, pages 74--85, 2014.Google ScholarCross Ref
- H. Liu, H. H. Huang, and Y. Hu. ibfs: Concurrent breadth-first search on gpus. In SIGMOD, pages 403--416, 2016. Google ScholarDigital Library
- L. Luo, M. Wong, and W.-m. Hwu. An effective gpu implementation of breadth-first search. In DAC, pages 52--55, 2010. Google ScholarDigital Library
- M. Martone, S. Filippone, S. Tucci, P. Gepner, and M. Paprzycki. Use of hybrid recursive csr/coo data structures in sparse matrix-vector multiplication. In IMCSIT, pages 327--335. IEEE, 2010.Google ScholarCross Ref
- A. McGregor. Graph stream algorithms: A survey. SIGMOD Rec., 43(1):9--20, 2014. Google ScholarDigital Library
- D. Merrill, M. Garland, and A. Grimshaw. High-Performance and Scalable GPU Graph Traversal. TOPC, 1(2), 2015. Google ScholarDigital Library
- R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. 2010.Google Scholar
- N. Ohsaka, T. Maehara, and K.-i. Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015. Google ScholarDigital Library
- Y. Saad. Numerical solution of large nonsymmetric eigenvalue problems. Computer Physics Communications, 53(1):71--90, 1989.Google ScholarCross Ref
- D. Sayce. 10 billions tweets, number of tweets per day. http://www.dsayce.com/social-media/10-billions-tweets/. Accessed: 2016-10-18.Google Scholar
- M. Sha, Y. Li, B. He, and K.-L. Tan. Technical report: Accelerating dynamic graph analytics on gpus. arXiv preprint arXiv:1709.05061, 2017.Google Scholar
- J. Soman, K. Kothapalli, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. IPDPS Workshops, 2010.Google ScholarCross Ref
- M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4):42--47, 2005. Google ScholarDigital Library
- J. A. Stratton, N. Anssari, C. Rodrigues, I.-J. Sung, N. Obeid, L. Chang, G. D. Liu, and W.-m. Hwu. Optimization and architecture effects on gpu computing workload performance. In InPar, pages 1--10, 2012.Google ScholarCross Ref
- N. Tang, Q. Chen, and P. Mitra. Graph stream summarization: From big bang to big crunch. In SIGMOD, pages 1481--1496, 2016. Google ScholarDigital Library
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, pages 147--156, 2014. Google ScholarDigital Library
- C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. DOULION: counting triangles in massive graphs with a coin. In SIGKDD, pages 837--846, 2009. Google ScholarDigital Library
- U. Verner, A. Schuster, M. Silberstein, and A. Mendelson. Scheduling processing of real-time data streams on heterogeneous multi-gpu systems. In SYSTOR, page 7, 2012. Google ScholarDigital Library
- Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: A high-performance graph processing library on the gpu. SIGPLAN Not., 50(8):265--266, 2015. Google ScholarDigital Library
- Y. Wang, Q. Fan, Y. Li, and K.-L. Tan. Real-time influence maximization on dynamic social streams. PVLDB, 10(7):805--816, 2017. Google ScholarDigital Library
- S. Yan, C. Li, Y. Zhang, and H. Zhou. yaspmv: yet another spmv framework on gpus. In SIGPLAN Notices, volume 49, pages 107--118, 2014. Google ScholarDigital Library
- X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: Implications for graph mining. PVLDB, 4(4):231--242, 2011. Google ScholarDigital Library
- Y. Yang, Z. Wang, J. Pei, and E. Chen. Tracking influential nodes in dynamic networks. arXiv preprint arXiv:1602.04490, 2016.Google Scholar
- M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP, pages 423--438, 2013. Google ScholarDigital Library
- H. Zhang, P. Lofgren, and A. Goel. Approximate personalized pagerank on dynamic graphs. arXiv preprint arXiv:1603.07796, 2016.Google Scholar
- Y. Zhang and F. Mueller. Gstream: A general-purpose data streaming framework on GPU clusters. In ICPP, pages 245--254, 2011. Google ScholarDigital Library
- J. Zhong and B. He. Medusa: Simplified graph processing on gpus. IEEE Trans. Parallel Distrib. Syst., 25(6):1543--1552, 2014. Google ScholarDigital Library
Recommendations
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC
Parallel and Distributed Computing, Applications and TechnologiesAbstractTo accelerate multiphysics applications, making use of not only GPUs but also FPGAs has been emerging. Multiphysics applications are simulations involving multiple physical models and multiple simultaneous physical phenomena. Operations with ...
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingGraphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Comments