skip to main content
10.1145/2939672.2939856acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Scalable Pattern Matching over Compressed Graphs via Dedensification

Published:13 August 2016Publication History

ABSTRACT

One of the most common operations on graph databases is graph pattern matching (e.g., graph isomorphism and more general types of "subgraph pattern matching"). In fact, in some graph query languages every single query is expressed as a graph matching operation. Consequently, there has been a significant amount of research effort in optimizing graph matching operations in graph database systems. As graph databases have scaled in recent years, so too has recent work on scaling graph matching operations. However, the performance of recent proposals for scaling graph pattern matching is limited by the presence of high-degree nodes. These high-degree nodes result in an explosion of intermediate result sizes during query execution, and therefore significant performance bottlenecks. In this paper we present a dedensification technique that losslessly compresses the neighborhood around high-degree nodes. Furthermore, we introduce a query processing technique that enables direct operation of graph query processing operations over the compressed data, without ever having to decompress the data. For pattern matching operations, we show how this technique can be implemented as a layer above existing graph database systems, so that the end-user can benefit from this technique without requiring modifications to the core graph database engine code. Our technique reduces the size of the intermediate result sets during query processing, and thereby improves query performance.

References

  1. A.-L. Barabási and R. Albert, "Emergence of scaling in random networks," Science, vol. 286, no. 5439, 1999.Google ScholarGoogle Scholar
  2. J. Leskovec, J. M. Kleinberg, and C. Faloutsos, "Graph evolution: Densification and shrinking diameters," TKDD, vol. 1, no. 1, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer, "Graph structure in the web - revisited: a trick of the heavy tail," in WWW, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Weiss, P. Karras, and A. Bernstein, "Hexastore: sextuple indexing for semantic web data management," PVLDB, vol. 1, no. 1, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Neumann and G. Weikum, "The RDF-3X engine for scalable management of RDF data," VLDB J., vol. 19, no. 1, pp. 91--113, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Fan, J. Li, X. Wang, and Y. Wu, "Query preserving graph compression," in SIGMOD, 2012, pp. 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Fan, X. Wang, and Y. Wu, "Querying big graphs within bounded resources," in SIGMOD, 2014, pp. 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Satuluri, S. Parthasarathy, and Y. Ruan, "Local graph sparsification for scalable clustering," in SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. A. Spielman and N. Srivastava, "Graph sparsification by effective resistances," SIAM J. Comput., vol. 40, no. 6, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Fan, F. Geerts, Y. Cao, T. Deng, and P. Lu, "Querying big data by accessing small data," in PODS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Buehrer and K. Chellapilla, "A scalable pattern mining approach to web graph compression with communities," in WSDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. J. Abadi, A. Marcus, S. Madden, and K. Hollenbach, "SW-Store: a vertically partitioned DBMS for semantic web data management," VLDB J., vol. 18, no. 2, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gubichev and T. Neumann, "Exploiting the query structure for efficient join ordering in SPARQL queries," in EDBT, 2014.Google ScholarGoogle Scholar
  14. L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL query engine," VLDB J., vol. 23, no. 4, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Leskovec and A. Krevl, "SNAP Datasets: Stanford large network dataset collection," http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  16. P. Boldi and S. Vigna, "The webgraph framework I: compression techniques," in WWW, 2004, pp. 595--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan, "On compressing social networks," in KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Lim, U. Kang, and C. Faloutsos, "SlashBurn: Graph compression and mining beyond caveman communities," TKDE, vol. 26, no. 12, pp. 3077--3089, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. X. Yan, P. S. Yu, and J. Han, "Graph indexing: A frequent structure-based approach," in SIGMOD, 2004, pp. 335--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Shao, H. Wang, and Y. Li, "Trinity: a distributed graph engine on a memory cloud," in SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang, "Fast graph pattern matching," in ICDE, 2008, pp. 913--922. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Pattern Matching over Compressed Graphs via Dedensification

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2016
          2176 pages
          ISBN:9781450342322
          DOI:10.1145/2939672

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 August 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader