Abstract
In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms.
- Apache Giraph. http://giraph.apache.org.Google Scholar
- DAS-5: Distributed ASCI Supercomputer 5. http://cs.vu.nl/das5.Google Scholar
- Oracle Big Data Spatial and Graph. http://oracle.com/database/big-data-spatial-and-graph.Google Scholar
- Oracle Labs PGX: Parallel Graph Analytics Overview. http://oracle.com/technetwork/oracle-labs/parallel-graph-analytics.Google Scholar
- SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.Google Scholar
- G. Aluç et al. Diversified stress testing of RDF data management systems. In ISWC, pages 197--212, 2014. Google ScholarDigital Library
- K. Ammar and M. T. Özsu. WGB: towards a universal graph benchmark. In WBDB, pages 58--72, 2013.Google Scholar
- M. Anderson et al. GraphPad: optimized graph primitives for parallel and distr. platforms. In IPDPS, 2016.Google Scholar
- T. Armstrong et al. LinkBench: a database benchmark based on the Facebook social graph. In SIGMOD, pages 1185--1196, 2013. Google ScholarDigital Library
- D. Bader and K. Madduri. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In HiPC, pages 465--476, 2005. Google ScholarDigital Library
- C. Bizer and A. Schultz. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst., 5(2):1--24, 2009.Google ScholarCross Ref
- M. Capota et al. Graphalytics: A big data benchmark for graph-processing platforms. In GRADES, pages 7:1--7:6, 2015. Google ScholarDigital Library
- M. Cha et al. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM, page 30, 2010.Google Scholar
- M. Dayarathna and T. Suzumura. Graph database benchmarking on cloud environments with XGDBench. Autom. Softw. Eng., 21(4):509--533, 2014. Google ScholarDigital Library
- A. Eisenman et al. Parallel graph processing: Prejudice and state of the art. In ICPE, 2016. Google ScholarDigital Library
- B. Elser and A. Montresor. An evaluation study of bigdata frameworks for graph processing. In Big Data, pages 60--67, 2013.Google Scholar
- O. Erling et al. The LDBC social network benchmark: Interactive workload. In SIGMOD, pages 619--630, 2015. Google ScholarDigital Library
- J. Fan et al. The case against specialized graph analytics engines. In CIDR, 2015.Google Scholar
- M. Ferdman et al. Clearing the clouds: a study of emerging scaleout workloads on modern hardware. In ASPLOS, pages 37--48, 2012. Google ScholarDigital Library
- A. Ghazal et al. BigBench: towards an industry standard benchmark for big data analytics. In SIGMOD, pages 1197--1208, 2013. Google ScholarDigital Library
- J. E. Gonzalez et al. PowerGraph: Distributed graph parallel computation on natural graphs. In OSDI, pages 17--30, 2012. Google ScholarDigital Library
- Y. Guo and A. Iosup. The game trace archive. In NETGAMES, page 4. IEEE Press, 2012. Google ScholarDigital Library
- Y. Guo et al. LUBM: A benchmark for OWL knowledge base systems. J. Web Sem., 3(2-3):158--182, 2005. Google ScholarDigital Library
- Y. Guo et al. How well do graph-processing platforms perform? In IPDPS, pages 395--404, 2014. Google ScholarDigital Library
- Y. Guo et al. An empirical performance evaluation of gpu-enabled graph-processing systems. In CCGrid, pages 423--432, 2015.Google ScholarDigital Library
- M. Han et al. An experimental comparison of pregel-like graph processing systems. PVLDB, 7(12):1047--1058, 2014. Google ScholarDigital Library
- S. Hong et al. PGX.D: a fast distributed graph processing engine. In SC, pages 58:1--58:12, 2015. Google ScholarDigital Library
- A. Iosup, et al. LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms. Technical Report DS-2016-001, Delft University of Technology, 2016. http://ds.ewi.tudelft.nl/research-publications/technical-reports/2016/.Google Scholar
- A. Jindal et al. Vertexica: your relational friend for graph analytics! PVLDB, 7(13):1669--1672, 2014. Google ScholarDigital Library
- Y. Lu et al. Large-scale distributed graph computing systems: An experimental evaluation. PVLDB, 8(3):281--292, 2014. Google ScholarDigital Library
- Z. Ming et al. BDGS: A scalable big data generator suite in big data benchmarking. In WBDB, pages 138--154, 2013.Google Scholar
- L. Nai et al. GraphBIG: understanding graph computing in the context of industrial solutions. In SC, pages 69:1--69:12, 2015. Google ScholarDigital Library
- W. L. Ngai. Fine-grained Performance Evaluation of Large-scale Graph Processing Systems. Master's thesis, Delft University of Technology, the Netherlands, 2015.Google Scholar
- L. Page et al. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
- T. Rabl et al. The vision of BigBench 2.0. In DanaC, pages 3:1--3:4, 2015. Google ScholarDigital Library
- U. Raghavan et al. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, 2007.Google ScholarCross Ref
- N. Satish et al. Navigating the maze of graph analytics frameworks using massive datasets. In SIGMOD, pages 979--990, 2014. Google ScholarDigital Library
- M. Schmidt et al. Sp^2 bench: a SPARQL performance benchmark. In ICDE, pages 222--233, 2009. Google ScholarDigital Library
- N. Sundaram et al. Graphmat: High performance graph analytics made productive. PVLDB, 8(11):1214--1225, 2015. Google ScholarDigital Library
- L. Wang et al. BigDataBench: a big data benchmark suite from internet services. In HPCA, pages 488--499, 2014.Google ScholarCross Ref
- R. Xin et al. GraphX: A resilient distr. graph system on Spark. In GRADES, page 2, 2013. Google ScholarDigital Library
Index Terms
- LDBC graphalytics: a benchmark for large-scale graph analysis on parallel and distributed platforms
Recommendations
The LDBC Social Network Benchmark: Interactive Workload
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataThe Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, and benchmarking practices for evaluating graph data management systems. The LDBC introduced a ...
Graphalytics: A Big Data Benchmark for Graph-Processing Platforms
GRADES'15: Proceedings of the GRADES'15Graphs are increasingly used in industry, governance, and science. This has stimulated the appearance of many and diverse graph-processing platforms. Although platform diversity is beneficial, it also makes it very challenging to select the best ...
LDBC: benchmarks for graph and RDF data management
IDEAS '13: Proceedings of the 17th International Database Engineering & Applications SymposiumThe Linked Data Benchmark Council (LDBC) is an EU project that aims to develop industry-strength benchmarks for graph and RDF data management systems. LDBC introduces a so-called "choke-point" based benchmark development, through which experts identify ...
Comments