skip to main content
10.1145/3302424.3303974acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs

Authors Info & Claims
Published:25 March 2019Publication History

ABSTRACT

Efficient streaming graph processing systems leverage incremental processing by updating computed results to reflect the change in graph structure for the latest graph snapshot. Although certain monotonic path-based algorithms produce correct results by refining intermediate values via numerical comparisons, directly reusing values that were computed before mutation does not work correctly for algorithms that require BSP semantics. Since structural mutations in streaming graphs render the intermediate results unusable, exploiting incremental computation while simultaneously providing synchronous processing guarantees is challenging.

In this paper we develop GraphBolt which incrementally processes streaming graphs while guaranteeing BSP semantics. GraphBolt incorporates dependency-driven incremental processing where it first tracks dependencies to capture how intermediate values get computed, and then uses this information to incrementally propagate the impact of change across intermediate values. To support wide variety of graph-based analytics, GraphBolt provides a generalized incremental programming model that enables development of incremental versions of complex aggregations. Our evaluation shows that GraphBolt's incremental processing eliminates redundant computations and efficiently processes streaming graphs with varying mutation rates, starting from just a single edge mutation all the way up to 1 million edge mutations at a time. Furthermore, being specialized for graph computations, GraphBolt extracts high performance compared to Differential Dataflow.

References

  1. Daniel J Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, et al. The Design of the Borealis Stream Processing Engine. In CIDR, volume 5, pages 277--289, 2005.Google ScholarGoogle Scholar
  2. Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, and Shivakumar Venkataraman. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams. In SIGMOD, pages 577--588, New York, NY, USA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast Incremental and Personalized PageRank. VLDB, 4(3):173--184, December 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Mitch Cherniack, Christian Convey, Eddie Galvez, Jon Salz, Michael Stonebraker, Nesime Tatbul, et al. Retrospective on Aurora. The VLDB Journal, 13(4):370--383, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pramod Bhatotia, Umut A Acar, Flavio P Junqueira, and Rodrigo Rodrigues. Slider: Incremental Sliding Window Analytics. In Middleware, pages 61--72. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. Efficiently Updating Materialized Views. In SIGMOD, pages 61--71, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paolo Boldi and Sebastiano Vigna. The WebGraph Framework I: Compression Techniques. In WWW, pages 595--601, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM, pages 10--17, 2010.Google ScholarGoogle Scholar
  9. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In EuroSys, pages 1:1--1:15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. Kineograph: Taking the Pulse of a Fast-changing and Connected World. In EuroSys, pages 85--98, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Prasanna Desikan, Nishith Pathak, Jaideep Srivastava, and Vipin Kumar. Incremental Page Rank Computation on Evolving Graphs. In WWW, pages 1094--1095, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Ediger, Karl Jiang, Jason Riedy, and David A. Bader. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. In IPDPSW, pages 1--8, 2010.Google ScholarGoogle Scholar
  13. David Ediger, Rob Mccoll, Jason Riedy, and David A. Bader. STINGER: High Performance Data Structure for Streaming Graphs. In HPEC, pages 1--5, 2012.Google ScholarGoogle Scholar
  14. Friendster network dataset. http://konect.uni-koblenz.de/networks/friendster.KONECT, 2015.Google ScholarGoogle Scholar
  15. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI, pages 17--30, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI, pages 599--613, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. Maintaining Views Incrementally. In SIGMOD, pages 157--166, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. Chronos: A Graph Engine for Temporal Graph Analysis. In EuroSys, pages 1:1--1:14, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. Time-Evolving Graph Processing at Scale. In GRADES, page 5. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. U Kang, Duen Horng, and Christos Faloutsos. Inference of Beliefs on Billion-Scale Graphs. In KDD-LDMTA, 2010.Google ScholarGoogle Scholar
  21. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, A Social Network or a News Media? In WWW, pages 591--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski, and Google Inc. Pregel: A System for Large-Scale Graph Processing. In SIGMOD, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better Algorithms for Counting Triangles in Data Streams. In PODS, pages 401--411, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Frank McSherry. A Uniform Approach to Accelerated PageRank Computation. In WWW '05, pages 575--582, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Frank McSherry, Derek G. Murray, Rebecca Isaacs, and Michael Isard. Differential Dataflow. In CIDR, 2013.Google ScholarGoogle Scholar
  26. Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A Timely Dataflow System. In SOSP, pages 439--455. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. In SOSP, pages 456--471, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kamal Nigam and Rayid Ghani. Analyzing the Effectiveness and Applicability of Co-training. In CIKM, pages 86--93. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Vivek Nigam, Limin Jia, Boon Thau Loo, and Andre Scedrov. Maintaining Distributed Logic Programs Incrementally. In PPDP, pages 125--136, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University, 1998.Google ScholarGoogle Scholar
  31. Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, and Joseph M. Hellerstein. Enabling Real-Time Querying of Live and Historical Stream Data. In SSDBM, pages 28--, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, and Reynold Cheng. On Querying Historical Evolving Graph Sequences, 2011.Google ScholarGoogle Scholar
  33. Jason Riedy and Henning Meyerhenke. Scalable Algorithms for Analysis of Massive, Streaming Graphs. In SIAM PP, 2012.Google ScholarGoogle Scholar
  34. Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In SOSP, pages 410--424, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In SOSP, pages 472--488, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pratanu Roy, Arijit Khan, and Gustavo Alonso. Augmented Sketch: Faster and More Accurate Stream Processing. In SIGMOD, pages 1449--1463, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Euro-Par, pages 319--333. Springer, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. Tornado: A System For Real-Time Iterative Analysis Over Evolving Data. In SIGMOD, pages 417--430, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Julian Shun and Guy E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In PPoPP, pages 135--146, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Toyotaro Suzumura, Shunsuke Nishii, and Masaru Ganse. Towards Large-scale Graph Stream Processing Platform. In WWW Companion, pages 1321--1326, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kanat Tangwongsan, A. Pavan, and Srikanta Tirthapura. Parallel Triangle Counting in Massive Streaming Graphs. In CIKM, pages 781--786, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. Storm @ Twitter. In SIGMOD, pages 147--156. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Keval Vora, Rajiv Gupta, and Guoqing Xu. Synergistic Analysis of Evolving Graphs. ACM TACO, 13(4):32, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Keval Vora, Rajiv Gupta, and Guoqing Xu. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In ASPLOS, pages 237--251, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Keval Vora, Sai Charan Koduru, and Rajiv Gupta. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms Using a Relaxed Consistency Based DSM. In OOPSLA, pages 861--878, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Keval Vora, Guoqing (Harry) Xu, and Rajiv Gupta. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing. In USENIX ATC, pages 507--522, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wikipedia links, english network dataset. http://konect.uni-koblenz.de/networks/wikipedia_link_en. KONECT, 2017.Google ScholarGoogle Scholar
  48. Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. GraM: Scaling Graph Computation to the Trillions. In SoCC, pages 408--421, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yahoo! Webscope Program. http://webscope.sandbox.yahoo.com/.Google ScholarGoogle Scholar
  50. Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. In HotCloud, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Erik Zeitler and Tore Risch. Massive Scale-out of Expensive Continuous Queries. In VLDB, pages 1181--1188, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In AAIM, pages 337--348. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xiaojin Zhu and Zoubin Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. In CMU Technical Report CALD-02-107, 2002.Google ScholarGoogle Scholar
  54. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A Computation-Centric Distributed Graph Processing System. In OSDI, pages 301--316, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019
    March 2019
    714 pages
    ISBN:9781450362818
    DOI:10.1145/3302424

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 March 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate241of1,308submissions,18%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader