ABSTRACT
It is highly desired for existing distributed graph processing systems to support both offline analytics and online queries adaptively. Existing offline graph analytics systems are mostly based on synchronous model. Although achieving high throughput, they suffer relatively high latency in answering simple queries due to synchronization overhead and slow convergence. On the other hand, online graph query systems adopting asynchronous model can response at any time, while incur overwhelmed messages and network packets, making them unable to meet the high throughput demand of offline analytics. In this work, we propose an adaptive asynchronous message processing (AAMP) method, which improves the efficiency of network communication while maintains low latency, to efficiently support offline analytics and online queries in one graph processing framework. We then design GiraphAsync, an implementation of AAMP on top of Apache Giraph, and evaluate it using several representative offline analytics and online queries on large graph datasets. Experimental results show that GiraphAsync gains an up to 10X improvement over synchronous model systems for graph analytics, while performs as well as specialized systems for online graph queries.
- Apache Giraph. http://incubator.apache.org/giraph/.Google Scholar
- Jena. http://jena.apache.org/.Google Scholar
- Neo4j. http://neo4j.com/.Google Scholar
- M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix "bit" loaded: a scalable lightweight join query processor for rdf data. In WWW, pages 41--50, 2010. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, 3rd edition. Google ScholarDigital Library
- J. Gao, C. Zhou, J. Zhou, and J. X. Yu. Continuous pattern detection over billion-edge graph using distributed framework. In ICDE, pages 556--567, 2014.Google ScholarCross Ref
- G.Malewicz, M.H.Austern, A.J.C.Bik, J.C.Dehnert, I.Horn, N.Leiser, and G.Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010. Google ScholarDigital Library
- Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems. Journal of Web Semantics, 3(2--3):158--182, 2005. Google ScholarDigital Library
- M. Han and K. Daudjee. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. PVLDB, 8(9), 2015. Google ScholarDigital Library
- M. Han, K. Daudjee, K. Ammar, M. Tamerozsu, X. Wang, and T. Jin. An experimental comparison of pregel-like graph processing systems. PVLDB, 7(12):1047--1058, 2014. Google ScholarDigital Library
- B. Iordanov. Hypergraphdb: a generalized graph database. In WAIM, pages 25--36. Springer, 2010. Google ScholarDigital Library
- K.Haewoon, L.Changhyun, P.Hosung, and M.Sue. What is Twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarDigital Library
- A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi:large-scale graph computation on just a pc. OSDI, pages 31--46, 2012. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. PVLDB, 5(8):716--727, 2012. Google ScholarDigital Library
- Y. Lu, J. Cheng, D. Yan, and H. Wu. Large-scale distributed graph computing systems: An experimental evaluation. PVLDB, 8(3):281--292, 2014. Google ScholarDigital Library
- R. R. McCune, T. Weninger, and G. Madey. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv., 48(2), 2015. Google ScholarDigital Library
- T. Neumann and G. Weikum. Rdf-3x: a risc-style engine for rdf. PVLDB, 1(1), 2008. Google ScholarDigital Library
- M. Sarwat, S. Elnikety, Y. He, and M. F. Mokbel. HortonGoogle Scholar
- : A distributed system for processing declarative reachability queries over partitioned graphs. PVLDB, 6(14), 2013. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In SIGMOD, pages 505--516, 2013. Google ScholarDigital Library
- Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From "think like a vertex" to "think like a graph". PVLDB, 7(3):193--204, 2013. Google ScholarDigital Library
- U.Kang, C.E.Tsourakakis, and C.Faloutsos. Pegasus: A peta-scale graph mining system. In ICDE, pages 229--238, 2009. Google ScholarDigital Library
- L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarDigital Library
- G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In CIDR, 2013.Google Scholar
- C. Xie, R. Chen, H. Guan, B. Zang, and H. Chen. ync or async: Time to fuse for distributed graph-parallel computation. PPoPP, 2015. Google ScholarDigital Library
- Y. Zhang, Q. Gao, L. Gao, and C. Wang. Accelerate large-scale iterative computation through asynchronous accumulative updates. In ScienceCloud '12, pages 13--22, 2012. Google ScholarDigital Library
- C. Zhou, J. Gao, B. Sun, and J. X. Yu. Mocgraph: Scalable distributed graph processing using message online computing. PVLDB, 8(4):377--388, 2014. Google ScholarDigital Library
Index Terms
- GiraphAsync: Supporting Online and Offline Graph Processing via Adaptive Asynchronous Message Processing
Recommendations
A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems
AbstractOne of the concepts that attracts attention since entering of big data era is the graph-structured data. Suitable frameworks to handle such data would face several constraints, especially scalability, partitioning challenges, processing complexity ...
Random Walk TripleRush: Asynchronous Graph Querying and Sampling
WWW '15: Proceedings of the 24th International Conference on World Wide WebMost Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, ...
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstractions
HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed ComputingIterative computation on large graphs has challenged system research from two aspects: (1) how to conduct high performance parallel processing for both in-memory and out-of-core graphs; and (2) how to handle large graphs that exceed the resource ...
Comments