Abstract
Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We conducted an online survey aimed at understanding: (i) the types of graphs users have; (ii) the graph computations users run; (iii) the types of graph software users use; and (iv) the major challenges users face when processing their graphs. We describe the participants' responses to our questions highlighting common patterns and challenges. We further reviewed user feedback in the mailing lists, bug reports, and feature requests in the source repositories of a large suite of software products for processing graphs. Through our review, we were able to answer some new questions that were raised by participants' responses and identify specific challenges that users face when using different classes of graph software. The participants' responses and data we obtained revealed surprising facts about graph processing in practice. In particular, real-world graphs represent a very diverse range of entities and are often very large, and scalability and visualization are undeniably the most pressing challenges faced by participants. We hope these findings can guide future research.
- C. C. Aggarwal and H. Wang. Graph Data Management and Mining: A Survey of Algorithms and Applications, pages 13--68. Springer US, 2010.Google Scholar
- R. Angles, M. Arenas, P. Barceló, A. Hogan, J. L. Reutter, and D. Vrgoc. Foundations of Modern Graph Query Languages. CoRR, abs/1610.06264, 2016.Google Scholar
- ArrangoDB. https://www.arangodb.com.Google Scholar
- M. Balcan and K. Q. Weinberger, editors. Proceedings of the International Conference on Machine Learning, 2016. http://jmlr.org/proceedings/papers/v48/. Google ScholarDigital Library
- O. Batarfi, R. E. Shawi, A. G. Fayoumi, R. Nouri, S.-M.-R. Beheshti, A. Barnawi, and S. Sakr. Large Scale Graph Processing Systems: Survey and an Experimental Evaluation. Cluster Computing, 18(3):1189--1213, 2015. Google ScholarDigital Library
- Basic Linear Algebra Subprograms. http://www.netlib.org/blas.Google Scholar
- S. Bridgeman and R. Tamassia. A User Study in Similarity Measures for Graph Drawing, pages 19--30. Springer Berlin Heidelberg, 2001. Google ScholarDigital Library
- Caley Graph Database. https://cayley.io.Google Scholar
- L. Cao, C. Zhang, T. Joachims, G. I. Webb, D. D. Margineantu, and G. Williams, editors. Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2015. http://dl.acm.org/citation.cfm?id=2783258.Google Scholar
- A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One Trillion Edges: Graph Processing at Facebook-Scale. PVLDB, 8(12):1804--1815, 2015. Google ScholarDigital Library
- Conceptual Graphs. http://conceptualgraphs.org.Google Scholar
- W. Cui and H. Qu. A Survey on Graph Visualization. PhD Qualifying Exam Report, Computer Science Department, Hong Kong University of Science and Technology, 2007.Google Scholar
- Cytoscape. http://www.cytoscape.org.Google Scholar
- DGraph. https://dgraph.io.Google Scholar
- DTD and XSD XML Schemas. https://www.w3.org/standards/xml/schema.Google Scholar
- Elasticsearch X-Pack Graph. https://www.elastic.co/products/x-pack/graph.Google Scholar
- Apache Flink. https://flink.apache.org.Google Scholar
- Apache Flink User Survey 2016. https://github.com/dataArtisans/flink-user-survey-2016.Google Scholar
- Gephi. https://gephi.org.Google Scholar
- S. Ghandeharizadeh, S. Barahmand, M. Balazinska, and M. J. Freedman, editors. Proceedings of the Symposium on Cloud Computing, 2015. Google ScholarDigital Library
- Apache Giraph. https://giraph.apache.org.Google Scholar
- Graph for Scala. http://www.scala-graph.org.Google Scholar
- Graph 500 Benchmarks. http://graph500.org.Google Scholar
- GraphStream. http://graphstream-project.org.Google Scholar
- Graph-tool. https://graph-tool.skewed.de.Google Scholar
- Graphviz. https://graphviz.readthedocs.io.Google Scholar
- Apache Spark GraphX. https://spark.apache.org/graphx.Google Scholar
- Apache TinkerPop. https://tinkerpop.apache.org.Google Scholar
- P. Haase, J. Broekstra, A. Eberhart, and R. Volz. A Comparison of RDF Query Languages, pages 502--517. Springer Berlin Heidelberg, 2004. Google ScholarDigital Library
- I. Herman, G. Melançon, and M. S. Marshall. Graph Visualization and Navigation in Information Visualization: A Survey. IEEE Transactions on Visualization and Computer Graphics, 6(1):24--43, 2000. Google ScholarDigital Library
- D. Holten and J. J. van Wijk. A User Study on Visualizing Directed Edges in Graphs. In Proceedings of International Conference on Human Factors in Computing Systems, pages 2299--2308, 2009. Google ScholarDigital Library
- F. Holzschuher and R. Peinl. Performance of Graph Query Languages: Comparison of Cypher, Gremlin and Native Access in Neo4j. In Proceedings of the Joint EDBT/ICDT Workshops, pages 195--204, 2013. Google ScholarDigital Library
- ISO/IEC Directives, Part 1. http://www.iso.org/sites/directives/directives.html#toc_marker-16.Google Scholar
- H. V. Jagadish and A. Zhou, editors. PVLDB, Volume 7, 2013--2014. http://www.vldb.org/pvldb/vol7.html.Google Scholar
- JanusGraph. http://janusgraph.org.Google Scholar
- N. Jayaram, A. Khan, C. Li, X. Yan, and R. Elmasri. Querying Knowledge Graphs by Example Entity Tuples. CoRR, abs/1311.2100, 2013.Google Scholar
- JDBC. http://www.oracle.com/technetwork/java/overview-141217.html.Google Scholar
- Apache Jena. https://jena.apache.org.Google Scholar
- A. Katifori, C. Halatsis, G. Lepouras, C. Vassilakis, and E. Giannopoulou. Ontology Visualization Methods: A Survey. ACM Computing Surveys, 39(4):10, 2007. Google ScholarDigital Library
- K. Keeton and T. Roscoe, editors. Proceedings of the Symposium on Operating Systems Design and Implementation, 2016. https://www.usenix.org/conference/osdi16. Google ScholarDigital Library
- LDBC Benchmarks. http://ldbcouncil.org/benchmarks.Google Scholar
- LDBC D6.6.4 Standardization Report. http://ldbcouncil.org/sites/default/files/LDBC_D6.6.4.pdf.Google Scholar
- I. Letunic and P. Bork. Interactive Tree Of Life: An Online Tool for Phylogenetic Tree Display and Annotation. Bioinformatics, 23(1):127--128, 2006. Google ScholarDigital Library
- Y. Lu, J. Cheng, D. Yan, and H. Wu. Large-scale Distributed Graph Computing Systems: An Experimental Evaluation. PVLDB, 8(3):281--292, 2014. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In Proceedings of International Conference on Management of Data, pages 135--146, 2010. Google ScholarDigital Library
- MATLAB. https://www.mathworks.com.Google Scholar
- T. Mattson, D. A. Bader, J. W. Berry, A. Buluç, J. J. Dongarra, C. Faloutsos, J. Feo, J. R. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. E. Leiserson, A. Lumsdaine, D. A. Padua, S. Poole, S. P. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo. Standards for Graph Algorithm Primitives. In Proceedings of High Performance Extreme Computing Conference, pages 1--2, 2013.Google ScholarCross Ref
- Neo4j. https://neo4j.com.Google Scholar
- The 2016 State of the Graph Report, https://neo4j.com/resources/2016-state-of-the-graph.Google Scholar
- NetworKit. https://networkit.iti.kit.edu.Google Scholar
- NetworkX. https://networkx.github.io.Google Scholar
- openCypher. http://www.opencypher.org.Google Scholar
- OrientDB. https://orientdb.com.Google Scholar
- M. Paradies, M. Rudolf, and W. Lehner. GraphVista: Interactive Exploration Of Large Graphs. CoRR, abs/1506.00394, 2015.Google Scholar
- PGQL: Property Graph Query Language. http://pgql-lang.org.Google Scholar
- R. Pienta, F. Hohman, A. Tamersoy, A. Endert, S. Navathe, H. Tong, and D. H. Chau. Visual Graph Query Construction and Refinement. In Proceedings of International Conference on Management of Data, pages 1587--1590, 2017. Google ScholarDigital Library
- R. Pienta, A. Tamersoy, A. Endert, S. Navathe, H. Tong, and D. H. Chau. VISAGE: Interactive Visual Graph Querying. In Proceedings of International Working Conference on Advanced Visual Interfaces, pages 272--279, 2016. Google ScholarDigital Library
- M. Rath, D. Akehurst, C. Borowski, and P. Mäder. Are graph query languages applicable for requirements traceability analysis? In Proceedings of International Conference on Requirements Engineering: Foundation for Software Quality, 2017.Google Scholar
- M. A. Rodriguez. The Gremlin Graph Traversal Machine and Language. CoRR, abs/1508.03843, 2015. Google ScholarDigital Library
- S. Salihoglu, J. Shin, V. Khanna, B. Q. Truong, and J. Widom. Graft: A Debugging Tool For Apache Giraph. Technical report, Stanford University, 2014. http://ilpubs.stanford.edu:8090/1109/.Google Scholar
- A. Sharma, J. Jiang, P. Bommannavar, B. Larson, and J. Lin. GraphJet: Real-Time Content Recommendations at Twitter. PVLDB, 9(13):1281--1292, 2016. Google ScholarDigital Library
- SNAP: Standford Network Analysis Project. https://snap.stanford.edu.Google Scholar
- Lightbend Apache Survey 2015. https://info.lightbend.com/COLL-20XX-Spark-Survey-Report_LP.html.Google Scholar
- Sparksee. http://www.sparsity-technologies.com.Google Scholar
- The TPC-C benchmark. http://www.tpc.org/tpcc.Google Scholar
- C. Vehlow, F. Beck, and D. Weiskopf. Visualizing Group Structures in Graphs: A Survey. Computer Graphics Forum, 36(6):201--225, 2017. Google ScholarDigital Library
- OpenLink Virtuoso. https://virtuoso.openlinksw.com.Google Scholar
- C. Wang and J. Tao. Graphs in Scientific Visualization: A Survey. Computer Graphics Forum, 36(1):263--287, 2017. Google ScholarDigital Library
- J. West and C. M. Pancake, editors. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016. https://dl.acm.org/citation.cfm?id=3014904. Google ScholarDigital Library
Index Terms
- The ubiquity of large graphs and surprising challenges of graph processing
Recommendations
Large Induced Forests in Graphs
In this article, we prove three theorems. The first is that every connected graph of order n and size m has an induced forest of order at least 8n-2m-2/9 with equality if and only if such a graph is obtained from a tree by expanding every vertex to a ...
Connectivity of k-extendable graphs with large k
Discrete mathematics and theoretical computer science (DMTCS)Let G be a simple connected graph on 2n vertices with perfect matching. For a given positive integer k (0 ≤ k ≤ n - 1), G is k-extendable if any matching of size k in G is contained in a perfect matching of G. It is proved that if G is a k-extendable ...
The clique-separator graph for chordal graphs
We present a new representation of a chordal graph called the clique-separator graph, whose nodes are the maximal cliques and minimal vertex separators of the graph. We present structural properties of the clique-separator graph and additional ...
Comments