skip to main content
research-article

Accelerating dynamic graph analytics on GPUs

Published:01 September 2017Publication History
Skip Abstract Section

Abstract

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.

References

  1. Apache flink. https://flink.apache.org/. Accessed: 2016-10-18.Google ScholarGoogle Scholar
  2. Cusp library. https://developer.nvidia.com/cusp. Accessed: 2017-03-25.Google ScholarGoogle Scholar
  3. cusparse. https://developer.nvidia.com/cusparse. Accessed: 2016-11-09.Google ScholarGoogle Scholar
  4. CUDA UnBound (CUB) library. https://nvlabs.github.io/cub/, 2015.Google ScholarGoogle Scholar
  5. L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov., 29(3):626--688, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In SC, pages 781--792, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Ashari, N. Sedaghati, J. Eisenlohr, and P. Sadayappan. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on gpus. In ICS, pages 273--282, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarría-Miranda, C. Hastings, K. Madduri, and S. C. Poulos. Stinger: Spatio-temporal interaction networks and graphs (sting) extensible representation. Georgia Institute of Technology, Tech. Rep, 2009.Google ScholarGoogle Scholar
  9. N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. Technical Report NVR-2008--004, NVIDIA Corporation, 2008.Google ScholarGoogle Scholar
  10. M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. SIAM J. Comput., 35(2):341--358, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. A. Bender and H. Hu. An adaptive packed-memory array. ACM Trans. Database Syst., 32(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, pages 251--264, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Busato and N. Bombieri. Bfs-4k: an efficient implementation of bfs for kepler gpu architectures. TPDS, 26(7):1826--1838, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348. Springer, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  16. H.-V. Dang and B. Schmidt. The sliced coo format for sparse matrix-vector multiplication on cuda-enabled gpus. Procedia Computer Science, 9:57--66, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794--1813, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel gpu methods for single-source shortest paths. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 349--359. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Ediger, R. McColl, E. J. Riedy, and D. A. Bader. STINGER - High performance data structure for streaming graphs. HPEC, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Trans. Algorithms, 7(2):20:1--20:17, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2--3):207--216, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Fu, M. Personick, and B. Thompson. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. A High Level API for Fast Development of High Performance Graph Analytics on GPUs. ACM, New York, New York, USA, June 2014.Google ScholarGoogle Scholar
  23. S. Guha and A. McGregor. Graph synopses, sketches, and streams: A survey. PVLDB, 5(12):2030--2031, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Guo, Y. Li, M. Sha, and K.-L. Tan. Parallel personalized pagerank on dynamic graphs. PVLDB, 11(1), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Harish and P. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In International Conference on High-Performance Computing, pages 197--208. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. S. Hirschberg. Parallel algorithms for the transitive closure and the connected component problems. In Proceedings of the eighth annual ACM symposium on Theory of computing, pages 55--57. ACM, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. P. Iyer, L. E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. P. Iyer, L. E. Li, and I. Stoica. Celliq : Real-time cellular network analytics at scale. In NSDI, pages 309--322, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Kaleem, A. Venkat, S. Pai, M. Hall, and K. Pingali. Synchronization trade-offs in gpu implementations of graph algorithms. In Parallel and Distributed Processing Symposium, 2016 IEEE International, pages 514--523. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. J. King, T. Gilray, R. M. Kirby, and M. Might. Dynamic sparse-matrix allocation on gpus. In ISC, pages 61--80, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Leskovec and R. Sosič. Snap: A general-purpose network analysis and graph-mining library. TIST, 8(1):1, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Lin, R. Zhang, Z. Wen, H. Wang, and J. Qi. Efficient subgraph matching using gpus. In ADC, pages 74--85, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  33. H. Liu, H. H. Huang, and Y. Hu. ibfs: Concurrent breadth-first search on gpus. In SIGMOD, pages 403--416, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Luo, M. Wong, and W.-m. Hwu. An effective gpu implementation of breadth-first search. In DAC, pages 52--55, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Martone, S. Filippone, S. Tucci, P. Gepner, and M. Paprzycki. Use of hybrid recursive csr/coo data structures in sparse matrix-vector multiplication. In IMCSIT, pages 327--335. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. McGregor. Graph stream algorithms: A survey. SIGMOD Rec., 43(1):9--20, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Merrill, M. Garland, and A. Grimshaw. High-Performance and Scalable GPU Graph Traversal. TOPC, 1(2), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. 2010.Google ScholarGoogle Scholar
  39. N. Ohsaka, T. Maehara, and K.-i. Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Saad. Numerical solution of large nonsymmetric eigenvalue problems. Computer Physics Communications, 53(1):71--90, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  41. D. Sayce. 10 billions tweets, number of tweets per day. http://www.dsayce.com/social-media/10-billions-tweets/. Accessed: 2016-10-18.Google ScholarGoogle Scholar
  42. M. Sha, Y. Li, B. He, and K.-L. Tan. Technical report: Accelerating dynamic graph analytics on gpus. arXiv preprint arXiv:1709.05061, 2017.Google ScholarGoogle Scholar
  43. J. Soman, K. Kothapalli, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. IPDPS Workshops, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  44. M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4):42--47, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. A. Stratton, N. Anssari, C. Rodrigues, I.-J. Sung, N. Obeid, L. Chang, G. D. Liu, and W.-m. Hwu. Optimization and architecture effects on gpu computing workload performance. In InPar, pages 1--10, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  46. N. Tang, Q. Chen, and P. Mitra. Graph stream summarization: From big bang to big crunch. In SIGMOD, pages 1481--1496, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, pages 147--156, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. DOULION: counting triangles in massive graphs with a coin. In SIGKDD, pages 837--846, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. U. Verner, A. Schuster, M. Silberstein, and A. Mendelson. Scheduling processing of real-time data streams on heterogeneous multi-gpu systems. In SYSTOR, page 7, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: A high-performance graph processing library on the gpu. SIGPLAN Not., 50(8):265--266, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Y. Wang, Q. Fan, Y. Li, and K.-L. Tan. Real-time influence maximization on dynamic social streams. PVLDB, 10(7):805--816, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. S. Yan, C. Li, Y. Zhang, and H. Zhou. yaspmv: yet another spmv framework on gpus. In SIGPLAN Notices, volume 49, pages 107--118, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: Implications for graph mining. PVLDB, 4(4):231--242, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Y. Yang, Z. Wang, J. Pei, and E. Chen. Tracking influential nodes in dynamic networks. arXiv preprint arXiv:1602.04490, 2016.Google ScholarGoogle Scholar
  55. M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP, pages 423--438, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. H. Zhang, P. Lofgren, and A. Goel. Approximate personalized pagerank on dynamic graphs. arXiv preprint arXiv:1603.07796, 2016.Google ScholarGoogle Scholar
  57. Y. Zhang and F. Mueller. Gstream: A general-purpose data streaming framework on GPU clusters. In ICPP, pages 245--254, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. Zhong and B. He. Medusa: Simplified graph processing on gpus. IEEE Trans. Parallel Distrib. Syst., 25(6):1543--1552, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 11, Issue 1
    Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, Brazil
    September 2017
    120 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 September 2017
    Published in pvldb Volume 11, Issue 1

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader