ABSTRACT
Efficient streaming graph processing systems leverage incremental processing by updating computed results to reflect the change in graph structure for the latest graph snapshot. Although certain monotonic path-based algorithms produce correct results by refining intermediate values via numerical comparisons, directly reusing values that were computed before mutation does not work correctly for algorithms that require BSP semantics. Since structural mutations in streaming graphs render the intermediate results unusable, exploiting incremental computation while simultaneously providing synchronous processing guarantees is challenging.
In this paper we develop GraphBolt which incrementally processes streaming graphs while guaranteeing BSP semantics. GraphBolt incorporates dependency-driven incremental processing where it first tracks dependencies to capture how intermediate values get computed, and then uses this information to incrementally propagate the impact of change across intermediate values. To support wide variety of graph-based analytics, GraphBolt provides a generalized incremental programming model that enables development of incremental versions of complex aggregations. Our evaluation shows that GraphBolt's incremental processing eliminates redundant computations and efficiently processes streaming graphs with varying mutation rates, starting from just a single edge mutation all the way up to 1 million edge mutations at a time. Furthermore, being specialized for graph computations, GraphBolt extracts high performance compared to Differential Dataflow.
- Daniel J Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, et al. The Design of the Borealis Stream Processing Engine. In CIDR, volume 5, pages 277--289, 2005.Google Scholar
- Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, and Shivakumar Venkataraman. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams. In SIGMOD, pages 577--588, New York, NY, USA, 2013. Google ScholarDigital Library
- Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast Incremental and Personalized PageRank. VLDB, 4(3):173--184, December 2010. Google ScholarDigital Library
- Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Mitch Cherniack, Christian Convey, Eddie Galvez, Jon Salz, Michael Stonebraker, Nesime Tatbul, et al. Retrospective on Aurora. The VLDB Journal, 13(4):370--383, 2004. Google ScholarDigital Library
- Pramod Bhatotia, Umut A Acar, Flavio P Junqueira, and Rodrigo Rodrigues. Slider: Incremental Sliding Window Analytics. In Middleware, pages 61--72. ACM, 2014. Google ScholarDigital Library
- Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. Efficiently Updating Materialized Views. In SIGMOD, pages 61--71, 1986. Google ScholarDigital Library
- Paolo Boldi and Sebastiano Vigna. The WebGraph Framework I: Compression Techniques. In WWW, pages 595--601, 2004. Google ScholarDigital Library
- Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM, pages 10--17, 2010.Google Scholar
- Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In EuroSys, pages 1:1--1:15, 2015. Google ScholarDigital Library
- Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. Kineograph: Taking the Pulse of a Fast-changing and Connected World. In EuroSys, pages 85--98, 2012. Google ScholarDigital Library
- Prasanna Desikan, Nishith Pathak, Jaideep Srivastava, and Vipin Kumar. Incremental Page Rank Computation on Evolving Graphs. In WWW, pages 1094--1095, 2005. Google ScholarDigital Library
- David Ediger, Karl Jiang, Jason Riedy, and David A. Bader. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. In IPDPSW, pages 1--8, 2010.Google Scholar
- David Ediger, Rob Mccoll, Jason Riedy, and David A. Bader. STINGER: High Performance Data Structure for Streaming Graphs. In HPEC, pages 1--5, 2012.Google Scholar
- Friendster network dataset. http://konect.uni-koblenz.de/networks/friendster.KONECT, 2015.Google Scholar
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI, pages 17--30, 2012. Google ScholarDigital Library
- Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI, pages 599--613, 2014. Google ScholarDigital Library
- Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. Maintaining Views Incrementally. In SIGMOD, pages 157--166, 1993. Google ScholarDigital Library
- Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. Chronos: A Graph Engine for Temporal Graph Analysis. In EuroSys, pages 1:1--1:14, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. Time-Evolving Graph Processing at Scale. In GRADES, page 5. ACM, 2016. Google ScholarDigital Library
- U Kang, Duen Horng, and Christos Faloutsos. Inference of Beliefs on Billion-Scale Graphs. In KDD-LDMTA, 2010.Google Scholar
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, A Social Network or a News Media? In WWW, pages 591--600, 2010. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski, and Google Inc. Pregel: A System for Large-Scale Graph Processing. In SIGMOD, pages 135--146, 2010. Google ScholarDigital Library
- Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better Algorithms for Counting Triangles in Data Streams. In PODS, pages 401--411, 2016. Google ScholarDigital Library
- Frank McSherry. A Uniform Approach to Accelerated PageRank Computation. In WWW '05, pages 575--582, 2005. Google ScholarDigital Library
- Frank McSherry, Derek G. Murray, Rebecca Isaacs, and Michael Isard. Differential Dataflow. In CIDR, 2013.Google Scholar
- Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A Timely Dataflow System. In SOSP, pages 439--455. ACM, 2013. Google ScholarDigital Library
- Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. In SOSP, pages 456--471, 2013. Google ScholarDigital Library
- Kamal Nigam and Rayid Ghani. Analyzing the Effectiveness and Applicability of Co-training. In CIKM, pages 86--93. ACM, 2000. Google ScholarDigital Library
- Vivek Nigam, Limin Jia, Boon Thau Loo, and Andre Scedrov. Maintaining Distributed Logic Programs Incrementally. In PPDP, pages 125--136, 2011. Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University, 1998.Google Scholar
- Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, and Joseph M. Hellerstein. Enabling Real-Time Querying of Live and Historical Stream Data. In SSDBM, pages 28--, 2007. Google ScholarDigital Library
- Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, and Reynold Cheng. On Querying Historical Evolving Graph Sequences, 2011.Google Scholar
- Jason Riedy and Henning Meyerhenke. Scalable Algorithms for Analysis of Massive, Streaming Graphs. In SIAM PP, 2012.Google Scholar
- Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In SOSP, pages 410--424, 2015. Google ScholarDigital Library
- Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In SOSP, pages 472--488, 2013. Google ScholarDigital Library
- Pratanu Roy, Arijit Khan, and Gustavo Alonso. Augmented Sketch: Faster and More Accurate Stream Processing. In SIGMOD, pages 1449--1463, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Euro-Par, pages 319--333. Springer, 2016. Google ScholarDigital Library
- Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. Tornado: A System For Real-Time Iterative Analysis Over Evolving Data. In SIGMOD, pages 417--430, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- Julian Shun and Guy E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In PPoPP, pages 135--146, 2013. Google ScholarDigital Library
- Toyotaro Suzumura, Shunsuke Nishii, and Masaru Ganse. Towards Large-scale Graph Stream Processing Platform. In WWW Companion, pages 1321--1326, 2014. Google ScholarDigital Library
- Kanat Tangwongsan, A. Pavan, and Srikanta Tirthapura. Parallel Triangle Counting in Massive Streaming Graphs. In CIKM, pages 781--786, 2013. Google ScholarDigital Library
- Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. Storm @ Twitter. In SIGMOD, pages 147--156. ACM, 2014. Google ScholarDigital Library
- Keval Vora, Rajiv Gupta, and Guoqing Xu. Synergistic Analysis of Evolving Graphs. ACM TACO, 13(4):32, 2016. Google ScholarDigital Library
- Keval Vora, Rajiv Gupta, and Guoqing Xu. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In ASPLOS, pages 237--251, 2017. Google ScholarDigital Library
- Keval Vora, Sai Charan Koduru, and Rajiv Gupta. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms Using a Relaxed Consistency Based DSM. In OOPSLA, pages 861--878, 2014. Google ScholarDigital Library
- Keval Vora, Guoqing (Harry) Xu, and Rajiv Gupta. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing. In USENIX ATC, pages 507--522, 2016. Google ScholarDigital Library
- Wikipedia links, english network dataset. http://konect.uni-koblenz.de/networks/wikipedia_link_en. KONECT, 2017.Google Scholar
- Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. GraM: Scaling Graph Computation to the Trillions. In SoCC, pages 408--421, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- Yahoo! Webscope Program. http://webscope.sandbox.yahoo.com/.Google Scholar
- Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. In HotCloud, 2012. Google ScholarDigital Library
- Erik Zeitler and Tore Risch. Massive Scale-out of Expensive Continuous Queries. In VLDB, pages 1181--1188, 2011.Google ScholarDigital Library
- Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In AAIM, pages 337--348. Springer, 2008. Google ScholarDigital Library
- Xiaojin Zhu and Zoubin Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. In CMU Technical Report CALD-02-107, 2002.Google Scholar
- Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. Gemini: A Computation-Centric Distributed Graph Processing System. In OSDI, pages 301--316, 2016. Google ScholarDigital Library
Recommendations
KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations
Asplos'17Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate ...
GraphFly: efficient asynchronous streaming graphs processing via dependency-flow
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisExisting streaming graph processing systems typically adopt two phases of refinement and recomputation to ensure the correctness of the incremental computation. However, severe redundant memory accesses exist due to the unnecessary synchronization among ...
GraPU: Accelerate Streaming Graph Analysis through Preprocessing Buffered Updates
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingStreaming graph analysis extracts timely insights from evolving graphs, and has gained increasing popularity. For current streaming graph analytics systems, incoming updates are simply cached in a buffer, until being applied onto existing graph ...
Comments