skip to main content
research-article

The ubiquity of large graphs and surprising challenges of graph processing

Published:01 December 2017Publication History
Skip Abstract Section

Abstract

Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We conducted an online survey aimed at understanding: (i) the types of graphs users have; (ii) the graph computations users run; (iii) the types of graph software users use; and (iv) the major challenges users face when processing their graphs. We describe the participants' responses to our questions highlighting common patterns and challenges. We further reviewed user feedback in the mailing lists, bug reports, and feature requests in the source repositories of a large suite of software products for processing graphs. Through our review, we were able to answer some new questions that were raised by participants' responses and identify specific challenges that users face when using different classes of graph software. The participants' responses and data we obtained revealed surprising facts about graph processing in practice. In particular, real-world graphs represent a very diverse range of entities and are often very large, and scalability and visualization are undeniably the most pressing challenges faced by participants. We hope these findings can guide future research.

References

  1. C. C. Aggarwal and H. Wang. Graph Data Management and Mining: A Survey of Algorithms and Applications, pages 13--68. Springer US, 2010.Google ScholarGoogle Scholar
  2. R. Angles, M. Arenas, P. Barceló, A. Hogan, J. L. Reutter, and D. Vrgoc. Foundations of Modern Graph Query Languages. CoRR, abs/1610.06264, 2016.Google ScholarGoogle Scholar
  3. ArrangoDB. https://www.arangodb.com.Google ScholarGoogle Scholar
  4. M. Balcan and K. Q. Weinberger, editors. Proceedings of the International Conference on Machine Learning, 2016. http://jmlr.org/proceedings/papers/v48/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. O. Batarfi, R. E. Shawi, A. G. Fayoumi, R. Nouri, S.-M.-R. Beheshti, A. Barnawi, and S. Sakr. Large Scale Graph Processing Systems: Survey and an Experimental Evaluation. Cluster Computing, 18(3):1189--1213, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Basic Linear Algebra Subprograms. http://www.netlib.org/blas.Google ScholarGoogle Scholar
  7. S. Bridgeman and R. Tamassia. A User Study in Similarity Measures for Graph Drawing, pages 19--30. Springer Berlin Heidelberg, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Caley Graph Database. https://cayley.io.Google ScholarGoogle Scholar
  9. L. Cao, C. Zhang, T. Joachims, G. I. Webb, D. D. Margineantu, and G. Williams, editors. Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2015. http://dl.acm.org/citation.cfm?id=2783258.Google ScholarGoogle Scholar
  10. A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One Trillion Edges: Graph Processing at Facebook-Scale. PVLDB, 8(12):1804--1815, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Conceptual Graphs. http://conceptualgraphs.org.Google ScholarGoogle Scholar
  12. W. Cui and H. Qu. A Survey on Graph Visualization. PhD Qualifying Exam Report, Computer Science Department, Hong Kong University of Science and Technology, 2007.Google ScholarGoogle Scholar
  13. Cytoscape. http://www.cytoscape.org.Google ScholarGoogle Scholar
  14. DGraph. https://dgraph.io.Google ScholarGoogle Scholar
  15. DTD and XSD XML Schemas. https://www.w3.org/standards/xml/schema.Google ScholarGoogle Scholar
  16. Elasticsearch X-Pack Graph. https://www.elastic.co/products/x-pack/graph.Google ScholarGoogle Scholar
  17. Apache Flink. https://flink.apache.org.Google ScholarGoogle Scholar
  18. Apache Flink User Survey 2016. https://github.com/dataArtisans/flink-user-survey-2016.Google ScholarGoogle Scholar
  19. Gephi. https://gephi.org.Google ScholarGoogle Scholar
  20. S. Ghandeharizadeh, S. Barahmand, M. Balazinska, and M. J. Freedman, editors. Proceedings of the Symposium on Cloud Computing, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Apache Giraph. https://giraph.apache.org.Google ScholarGoogle Scholar
  22. Graph for Scala. http://www.scala-graph.org.Google ScholarGoogle Scholar
  23. Graph 500 Benchmarks. http://graph500.org.Google ScholarGoogle Scholar
  24. GraphStream. http://graphstream-project.org.Google ScholarGoogle Scholar
  25. Graph-tool. https://graph-tool.skewed.de.Google ScholarGoogle Scholar
  26. Graphviz. https://graphviz.readthedocs.io.Google ScholarGoogle Scholar
  27. Apache Spark GraphX. https://spark.apache.org/graphx.Google ScholarGoogle Scholar
  28. Apache TinkerPop. https://tinkerpop.apache.org.Google ScholarGoogle Scholar
  29. P. Haase, J. Broekstra, A. Eberhart, and R. Volz. A Comparison of RDF Query Languages, pages 502--517. Springer Berlin Heidelberg, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. Herman, G. Melançon, and M. S. Marshall. Graph Visualization and Navigation in Information Visualization: A Survey. IEEE Transactions on Visualization and Computer Graphics, 6(1):24--43, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Holten and J. J. van Wijk. A User Study on Visualizing Directed Edges in Graphs. In Proceedings of International Conference on Human Factors in Computing Systems, pages 2299--2308, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Holzschuher and R. Peinl. Performance of Graph Query Languages: Comparison of Cypher, Gremlin and Native Access in Neo4j. In Proceedings of the Joint EDBT/ICDT Workshops, pages 195--204, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. ISO/IEC Directives, Part 1. http://www.iso.org/sites/directives/directives.html#toc_marker-16.Google ScholarGoogle Scholar
  34. H. V. Jagadish and A. Zhou, editors. PVLDB, Volume 7, 2013--2014. http://www.vldb.org/pvldb/vol7.html.Google ScholarGoogle Scholar
  35. JanusGraph. http://janusgraph.org.Google ScholarGoogle Scholar
  36. N. Jayaram, A. Khan, C. Li, X. Yan, and R. Elmasri. Querying Knowledge Graphs by Example Entity Tuples. CoRR, abs/1311.2100, 2013.Google ScholarGoogle Scholar
  37. JDBC. http://www.oracle.com/technetwork/java/overview-141217.html.Google ScholarGoogle Scholar
  38. Apache Jena. https://jena.apache.org.Google ScholarGoogle Scholar
  39. A. Katifori, C. Halatsis, G. Lepouras, C. Vassilakis, and E. Giannopoulou. Ontology Visualization Methods: A Survey. ACM Computing Surveys, 39(4):10, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Keeton and T. Roscoe, editors. Proceedings of the Symposium on Operating Systems Design and Implementation, 2016. https://www.usenix.org/conference/osdi16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. LDBC Benchmarks. http://ldbcouncil.org/benchmarks.Google ScholarGoogle Scholar
  42. LDBC D6.6.4 Standardization Report. http://ldbcouncil.org/sites/default/files/LDBC_D6.6.4.pdf.Google ScholarGoogle Scholar
  43. I. Letunic and P. Bork. Interactive Tree Of Life: An Online Tool for Phylogenetic Tree Display and Annotation. Bioinformatics, 23(1):127--128, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Y. Lu, J. Cheng, D. Yan, and H. Wu. Large-scale Distributed Graph Computing Systems: An Experimental Evaluation. PVLDB, 8(3):281--292, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In Proceedings of International Conference on Management of Data, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. MATLAB. https://www.mathworks.com.Google ScholarGoogle Scholar
  47. T. Mattson, D. A. Bader, J. W. Berry, A. Buluç, J. J. Dongarra, C. Faloutsos, J. Feo, J. R. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. E. Leiserson, A. Lumsdaine, D. A. Padua, S. Poole, S. P. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo. Standards for Graph Algorithm Primitives. In Proceedings of High Performance Extreme Computing Conference, pages 1--2, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  48. Neo4j. https://neo4j.com.Google ScholarGoogle Scholar
  49. The 2016 State of the Graph Report, https://neo4j.com/resources/2016-state-of-the-graph.Google ScholarGoogle Scholar
  50. NetworKit. https://networkit.iti.kit.edu.Google ScholarGoogle Scholar
  51. NetworkX. https://networkx.github.io.Google ScholarGoogle Scholar
  52. openCypher. http://www.opencypher.org.Google ScholarGoogle Scholar
  53. OrientDB. https://orientdb.com.Google ScholarGoogle Scholar
  54. M. Paradies, M. Rudolf, and W. Lehner. GraphVista: Interactive Exploration Of Large Graphs. CoRR, abs/1506.00394, 2015.Google ScholarGoogle Scholar
  55. PGQL: Property Graph Query Language. http://pgql-lang.org.Google ScholarGoogle Scholar
  56. R. Pienta, F. Hohman, A. Tamersoy, A. Endert, S. Navathe, H. Tong, and D. H. Chau. Visual Graph Query Construction and Refinement. In Proceedings of International Conference on Management of Data, pages 1587--1590, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. R. Pienta, A. Tamersoy, A. Endert, S. Navathe, H. Tong, and D. H. Chau. VISAGE: Interactive Visual Graph Querying. In Proceedings of International Working Conference on Advanced Visual Interfaces, pages 272--279, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. Rath, D. Akehurst, C. Borowski, and P. Mäder. Are graph query languages applicable for requirements traceability analysis? In Proceedings of International Conference on Requirements Engineering: Foundation for Software Quality, 2017.Google ScholarGoogle Scholar
  59. M. A. Rodriguez. The Gremlin Graph Traversal Machine and Language. CoRR, abs/1508.03843, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. S. Salihoglu, J. Shin, V. Khanna, B. Q. Truong, and J. Widom. Graft: A Debugging Tool For Apache Giraph. Technical report, Stanford University, 2014. http://ilpubs.stanford.edu:8090/1109/.Google ScholarGoogle Scholar
  61. A. Sharma, J. Jiang, P. Bommannavar, B. Larson, and J. Lin. GraphJet: Real-Time Content Recommendations at Twitter. PVLDB, 9(13):1281--1292, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. SNAP: Standford Network Analysis Project. https://snap.stanford.edu.Google ScholarGoogle Scholar
  63. Lightbend Apache Survey 2015. https://info.lightbend.com/COLL-20XX-Spark-Survey-Report_LP.html.Google ScholarGoogle Scholar
  64. Sparksee. http://www.sparsity-technologies.com.Google ScholarGoogle Scholar
  65. The TPC-C benchmark. http://www.tpc.org/tpcc.Google ScholarGoogle Scholar
  66. C. Vehlow, F. Beck, and D. Weiskopf. Visualizing Group Structures in Graphs: A Survey. Computer Graphics Forum, 36(6):201--225, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. OpenLink Virtuoso. https://virtuoso.openlinksw.com.Google ScholarGoogle Scholar
  68. C. Wang and J. Tao. Graphs in Scientific Visualization: A Survey. Computer Graphics Forum, 36(1):263--287, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. J. West and C. M. Pancake, editors. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2016. https://dl.acm.org/citation.cfm?id=3014904. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The ubiquity of large graphs and surprising challenges of graph processing
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 11, Issue 4
          December 2017
          133 pages
          ISSN:2150-8097
          Issue’s Table of Contents

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 December 2017
          Published in pvldb Volume 11, Issue 4

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader