ABSTRACT
Processing graphs, especially at large scale, is an increasingly useful activity in a variety of business, engineering, and scientific domains. Already, there are tens of graph-processing platforms, such as Hadoop, Giraph, GraphLab, etc., each with a different design and functionality. For graph-processing to continue to evolve, users have to find it easy to select a graph-processing platform, and developers and system integrators have to find it easy to quantify the performance and other non-functional aspects of interest. However, the state of performance analysis of graph-processing platforms is still immature: there are few studies and, for the few that exist, there are few similarities, and relatively little understanding of the impact of dataset and algorithm diversity on performance. Our vision is to develop, with the help of the performance-savvy community, a comprehensive benchmarking suite for graph-processing platforms. In this work, we take a step in this direction, by proposing a set of seven challenges, summarizing our previous work on performance evaluation of distributed graph-processing platforms, and introducing our on-going work within the SPEC Research Group's Cloud Working Group.
- DAS4. http://www.cs.vu.nl/das4/.Google Scholar
- Giraph. http://giraph.apache.org/.Google Scholar
- Neo4j. http://www.neo4j.org/.Google Scholar
- SNAP. http://snap.stanford.edu/index.html.Google Scholar
- YARN. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.Google Scholar
- D. D. Abreu, A. Flores, G. Palma, V. Pestana, J. Piñero, J. Queipo, J. Sánchez, and M.-E. Vidal. Choosing between graph databases and rdf engines for consuming and mining linked data. In COLD, 2013.Google Scholar
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. 1999.Google Scholar
- A. Buluç, E. Duriakova, A. Fox, J. R. Gilbert, S. Kamil, A. Lugowski, L. Oliker, and S. Williams. High-Productivity 1 http://research.spec.org/working-groups/rg-cloud-working-group.html2 You are invited to participate, http://goo.gl/TJwkTg. and High-Performance Analysis of Filtered Semantic Graphs. In IPDPS, 2013. Google ScholarDigital Library
- J. Cai and C. K. Poon. Path-hop: efficiently indexing large graphs for reachability queries. In CIKM, 2010. Google ScholarDigital Library
- G. Cong and K. Makarychev. Optimizing Large-scale Graph Analysis on Multithreaded, Multicore Platforms. In IPDPS, 2012. Google ScholarDigital Library
- M. Dayarathna and T. Suzumura. Xgdbench: A benchmarking platform for graph stores in exascale clouds. In CloudCom, pages 363--370, 2012. Google ScholarDigital Library
- B. Elser and A. Montresor. An evaluation study of bigdata frameworks for graph processing. In IEEE BigData, 2013.Google ScholarCross Ref
- S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning fast iterative data flows. PVLDB, 5(11):1268--1279, 2012. Google ScholarDigital Library
- B. Ghit, N. Yigitbasi, and D. Epema. Resource Management for Dynamic MapReduce Clusters in Multicluster Systems. In SC|12 MTAGS, 2012. Best paper award. Google ScholarDigital Library
- Graph500. http://www.graph500.org/.Google Scholar
- D. Gregor and A. Lumsdaine. The Parallel BGL: A Generic Library for Distributed Graph Computations. POOSC, 2005.Google Scholar
- Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. How well do graph-processing platforms perform? an empirical performance evaluation and analysis. In IPDPS, 2013. http://www.pds.ewi.tudelft.nl/~iosup/perf-eval-graph-proc14ipdps.pdf.Google Scholar
- Y. Guo and A. Iosup. The Game Trace Archive. In NetGames, 2012. Google ScholarDigital Library
- W. Jiang and G. Agrawal. Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining. In CCGRID, 2011. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In SIGKDD, 2005. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. In VLDB, pages 716--727, 2012. Google ScholarDigital Library
- A. Lugowski, D. M. Alber, A. Buluç, J. R. Gilbert, S. Reinhardt, Y. Teng, and A. Waranis. A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis. In SDM, 2012.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD, pages 135--146, 2010. Google ScholarDigital Library
- D. Merrill, M. Garland, and A. S. Grimshaw. Scalable GPU graph traversal. In PPOPP, 2012. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In SIGMOD, 2013. Google ScholarDigital Library
- J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPOPP, 2013. Google ScholarDigital Library
- E. Solomonik, A. Buluç, and J. Demmel. Minimizing Communication in All-Pairs Shortest Paths. In IPDPS, 2013. Google ScholarDigital Library
- N. Wang, J. Zhang, K.-L. Tan, and A. K. H. Tung. On Triangulation-based Dense Neighborhood Graphs Discovery. VLDB, 2010. Google ScholarDigital Library
- T. White. Hadoop: The definitive guide. O'Reilly Media, Inc., 2012. Google ScholarDigital Library
- B. Wu and Y. Du. Cloud-Based Connected Component Algorithm. In ICAICI, pages 122--126, 2010. Google ScholarDigital Library
Index Terms
- Benchmarking graph-processing platforms: a vision
Recommendations
From Performance to Dependability Benchmarking: A Mandatory Path
Performance Evaluation and BenchmarkingThe work on performance benchmarking has started long ago. Ranging from simple benchmarks that target a very specific system or component to very complex benchmarks for complex infrastructures, performance benchmarks have contributed to improve ...
Measuring and Benchmarking Power Consumption and Energy Efficiency
ICPE '18: Companion of the 2018 ACM/SPEC International Conference on Performance EngineeringEnergy efficiency is an important quality of computing systems. Researchers try to analyze, model, and predict the energy efficiency and power consumption of systems. Such research requires energy efficiency and power measurements, as well as ...
Benchmarking data warehouses
Database benchmarks can either help users in comparing the performances of different systems, or help engineers in testing the effect of various design choices. In the field of data warehouses, the Transaction Processing Performance Council's standard ...
Comments