Abstract
Graphs have become increasingly important to represent highly-interconnected structures and schema-less data including the World Wide Web, social networks, knowledge graphs, genome and scientific databases, medical and government records. The massive scale of graph data easily overwhelms the main memory and computation resources on commodity servers. In these cases, achieving low latency and high throughput requires partitioning the graph and processing the graph data in parallel across a cluster of servers. However, the software and and hardware advances that have worked well for developing parallel databases and scientific applications are not necessarily effective for big-graph problems. Graph processing poses interesting system challenges: graphs represent relationships which are usually irregular and unstructured; and therefore, the computation and data access patterns have poor locality. Hence, the last few years has seen an unprecedented interest in building systems for big-graphs by various communities including databases, systems, semantic web, machine learning, and operations research. In this tutorial, we discuss the design of the emerging systems for processing of big-graphs, key features of distributed graph algorithms, as well as graph partitioning and workload balancing techniques. We emphasize the current challenges and highlight some future research directions.
- P. Cudr-Mauroux and S. Elnikety. Graph Data Management Systems for New Application Domains. In VLDB, 2011.Google ScholarDigital Library
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI, 2012. Google ScholarDigital Library
- D. Jiang, G. Chen, B. C. Ooi, K.-L. Tan, and S. Wu. epiC: an Extensible and Scalable System for Processing Big Data. In VLDB, 2014. Google ScholarDigital Library
- A. Khan, Y. Wu, and X. Yan. Emerging Graph Queries in Linked Data. In ICDE, 2012. Google ScholarDigital Library
- Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing. In EuroSys, 2013. Google ScholarDigital Library
- A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale Graph Computation on Just a PC. In OSDI, 2012. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. In VLDB, 2012. Google ScholarDigital Library
- A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry. Challenges in Parallel Graph Processing. Parallel Processing Letters, 17(1):5--20, 2007.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD, 2010. Google ScholarDigital Library
- J. Mondal and A. Deshpande. Managing Large Dynamic Graphs Efficiently. In SIGMOD, 2012. Google ScholarDigital Library
- D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a Timely Dataflow System. In SOSP, 2013. Google ScholarDigital Library
- A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In SOSP, 2013. Google ScholarDigital Library
- S. Sakr, S. Elnikety, and Y. He. G-SPARQL: a Hybrid Engine for Querying Large Attributed Graphs. In CIKM, 2012. Google ScholarDigital Library
- M. Sarwat, S. Elnikety, Y. He, and M. F. Mokbel. Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs. In VLDB, 2013. Google ScholarDigital Library
- R. S. Xin, D. Crankshaw, A. Dave, J. E. Gonzalez, M. J. Franklin, and I. Stoica. GraphX: Unifying Data-Parallel and Graph-Parallel Analytics. CoRR, abs/1402.2394, 2014.Google Scholar
- S. Yang, X. Yan, B. Zong, and A. Khan. Towards Effective Partition Management for Large Graphs. In SIGMOD, 2012. Google ScholarDigital Library
- K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A Distributed Graph Engine for Web Scale RDF Data. In VLDB, 2013. Google ScholarDigital Library
Index Terms
- Systems for big-graphs
Recommendations
Big Graph Processing Systems: State-of-the-Art and Open Challenges
BIGDATASERVICE '15: Proceedings of the 2015 IEEE First International Conference on Big Data Computing Service and ApplicationsGraph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, ...
Comments