skip to main content
10.1145/1376616.1376676acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Efficient algorithms for exact ranked twig-pattern matching over graphs

Published:09 June 2008Publication History

ABSTRACT

Querying large-scale graph-structured data with twig patterns is attracting growing interest. Generally, a twig pattern could have an extremely large, potentially exponential, number of matches in a graph. Retrieving and returning to the user this many answers may both incur high computational overhead and overwhelm the user.

In this paper we propose two efficient algorithms, DP-B and DP-P, for retrieving top-ranked twig-pattern matches from large graphs. Our first algorithm, DP-B, is able to retrieve exact top-ranked answer matches from potentially exponentially many matches in time and space linear in the size of our data inputs even in the worst case. Further, beyond the linear-cost result of DP-B, our second algorithm, DP-P, could take far less than linear time and space cost in practice. To the best of our knowledge, our algorithms are the first to have these performance properties. Our experimental results demonstrate the high performance of both algorithms on large datasets. We also analyze and compare the performance trade-off between DP-B and DP-P from the theoretical and practical viewpoints.

References

  1. R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. SIGMOD, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. ICDE, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. D. Atkinson, J.-R. Sack, N. Santoro, and T. Strothotte. Min-max heaps and generalized priority queues. Communications of the ACM: 29(10), 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SODA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms (2nd edition). MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Dawes and D. Abrahams. Boost C++ Libraries. http://www.boost.org/.Google ScholarGoogle Scholar
  10. DBLP. DBLP bibliography. http://dblp.uni-trier.de/xml/.Google ScholarGoogle Scholar
  11. B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. PODS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Gou and R. Chirkova. Efficiently querying large XML data repositories: a survey. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W.-S. Li, K. S. Candan, Q. Vu, and D. Agrawal. Retrieving and organizing web pages by information unit. WWW, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Oracle. Oracle Berkeley DB Java Edition (Version 3.2.44). http://www.oracle.com/technology/products/berkeley-db/ je/index.html.Google ScholarGoogle Scholar
  21. Y. Qi, K. S. Candan, and M. L. Sapino. Sum-Max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Reich and P. Widmayer. Beyond Steiner?s problem: A VLSI oriented generalization. WG Workshop, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W3C. XML path language (XPath) Version 1.0. http://www.w3.org/TR/xpath, 1999.Google ScholarGoogle Scholar
  24. W3C. Resource Description framework (RDF). http://www.w3.org/RDF/, 2004.Google ScholarGoogle Scholar
  25. H.Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient algorithms for exact ranked twig-pattern matching over graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
        June 2008
        1396 pages
        ISBN:9781605581026
        DOI:10.1145/1376616

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader