ABSTRACT
Big data has shifted the computing paradigm of data analysis. While some of the data can be treated as simple texts or independent data records, many other applications have data with structural patterns which are modeled as a graph, such as social media, road network traffic and smart grid, etc. However, there is still limited amount of work has been done to address the velocity problem of graph processing. In this work, we aim to develop a distributed processing system for solving pattern matching queries on streaming graphs where graphs evolve over time upon the arrives of streaming graph update events. To achieve the goal, we proposed an incremental pattern matching algorithm and implemented it on GPS, a vertex centric distributed graph computing framework. We also extended the GPS framework to support streaming graph, and adapted a subgraphcentric data model to further reduce communication overhead and system performance. Our evaluation using real wiki trace shows that our approach achieves a 3x -- 10x speedup over the batch algorithm, and significantly reduces network and memory usage.
- Apache giraph. http://giraph.apache.org/.Google Scholar
- D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarría-Miranda, C. Hastings, K. Madduri, and S. C. Poulos. Stinger: Spatio-temporal interaction networks and graphs (sting) extensible representation. Georgia Institute of Technology, Tech. Rep, 2009.Google Scholar
- J. Brynielsson, J. Hogberg, L. Kaati, C. Martenson, and P. Svenson. Detecting social positions using simulation. In Advances in Social Networks Analysis and Mining, International Conference on, pages 48--55, Aug 2010. Google ScholarDigital Library
- R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In ACM EuroSys, pages 85--98, 2012. Google ScholarDigital Library
- W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, and Y. Wu. Incremental graph pattern matching. In ACM SIGCOM. Google ScholarDigital Library
- W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. Proc. VLDB Endow., 3(1--2):264--275, Sept. 2010. Google ScholarDigital Library
- W. Fan, X. Wang, Y. Wu, and D. Deng. Distributed graph simulation: Impossibility and possibility. Proc. VLDB Endow., 7(12):1083--1094, Aug. 2014. Google ScholarDigital Library
- A. Fard, M. Nisar, L. Ramaswamy, J. Miller, and M. Saltz. A distributed vertex-centric approach for pattern matching in massive graphs. In Big Data, 2013 IEEE International Conference on, pages 403--411, Oct 2013.Google ScholarCross Ref
- M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990. Google ScholarDigital Library
- J. C. Godskesen and S. Nanz. Mobility models and behavioural equivalence for wireless networks. In Coordination Models and Languages, pages 106--122, 2009. Google ScholarDigital Library
- M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS '95, pages 453--, 1995. Google ScholarDigital Library
- Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In ACM EuroSys, pages 169--182, 2013. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012. Google ScholarDigital Library
- S. Ma, Y. Cao, J. Huai, and T. Wo. Distributed graph pattern matching. In Proceedings of the 21st International Conference on World Wide Web, pages 949--958, 2012. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In ACM SIGCOM, pages 135--146, 2010. Google ScholarDigital Library
- A. McGregor. Graph stream algorithms: A survey. SIGMOD Rec., 43(1):9--20, May 2014. Google ScholarDigital Library
- A. Ntoulas, J. Cho, and C. Olston. What's new on the web?: The evolution of the web from a search engine perspective. In Proceedings of the 13th International Conference on World Wide Web, pages 1--12, 2004. Google ScholarDigital Library
- A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of ACM Symposium on Operating Systems Principles, pages 472--488, 2013. Google ScholarDigital Library
- S. Salihoglu and J. Widom. Gps: A graph processing system. In SSDBM. Google ScholarDigital Library
- Y. Simmhan, A. Kumbhare, C. Wickramaarachchi, S. Nagarkar, S. Ravi, C. Raghavendra, and V. Prasanna. Goffish: A sub-graph centric framework for large-scale graph analytics. In Euro-Par, pages 451--462. 2014.Google ScholarCross Ref
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1222--1230, 2012. Google ScholarDigital Library
- Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment, 7(3):193--204, 2013. Google ScholarDigital Library
- L. Vaquero, F. Cuadrado, D. Logothetis, and C. Martella. Adaptive partitioning for large-scale dynamic graphs. In Proceedings of Symposium on Cloud Computing, pages 35:1--35:2, 2013. Google ScholarDigital Library
- C. Wickramaarachchi, M. Frincu, and V. Prasanna. Enabling real-time pro-active analytics on streaming graphs. algorithms, 15:18, 2010.Google Scholar
- R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, pages 2:1--2:6, 2013. Google ScholarDigital Library
Index Terms
- Distributed Incremental Pattern Matching on Streaming Graphs
Recommendations
A Survey on Distributed Graph Pattern Matching in Massive Graphs
Besides its NP-completeness, the strict constraints of subgraph isomorphism are making it impractical for graph pattern matching (GPM) in the context of big data. As a result, relaxed GPM models have emerged as they yield interesting results in a ...
Incremental graph pattern matching
Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms ...
Scalable Pattern Matching over Compressed Graphs via Dedensification
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningOne of the most common operations on graph databases is graph pattern matching (e.g., graph isomorphism and more general types of "subgraph pattern matching"). In fact, in some graph query languages every single query is expressed as a graph matching ...
Comments