ABSTRACT
Querying large-scale graph-structured data with twig patterns is attracting growing interest. Generally, a twig pattern could have an extremely large, potentially exponential, number of matches in a graph. Retrieving and returning to the user this many answers may both incur high computational overhead and overwhelm the user.
In this paper we propose two efficient algorithms, DP-B and DP-P, for retrieving top-ranked twig-pattern matches from large graphs. Our first algorithm, DP-B, is able to retrieve exact top-ranked answer matches from potentially exponentially many matches in time and space linear in the size of our data inputs even in the worst case. Further, beyond the linear-cost result of DP-B, our second algorithm, DP-P, could take far less than linear time and space cost in practice. To the best of our knowledge, our algorithms are the first to have these performance properties. Our experimental results demonstrate the high performance of both algorithms on large datasets. We also analyze and compare the performance trade-off between DP-B and DP-P from the theoretical and practical viewpoints.
- R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. SIGMOD, 1989. Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. ICDE, 2002.Google ScholarDigital Library
- M. D. Atkinson, J.-R. Sack, N. Santoro, and T. Strothotte. Min-max heaps and generalized priority queues. Communications of the ACM: 29(10), 1986. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE, 2002. Google ScholarDigital Library
- N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. SIGMOD, 2002. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SODA, 2002. Google ScholarDigital Library
- S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB, 2003. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms (2nd edition). MIT Press, 2001. Google ScholarDigital Library
- B. Dawes and D. Abrahams. Boost C++ Libraries. http://www.boost.org/.Google Scholar
- DBLP. DBLP bibliography. http://dblp.uni-trier.de/xml/.Google Scholar
- B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. ICDE, 2007.Google ScholarCross Ref
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. PODS, 2001. Google ScholarDigital Library
- G. Gou and R. Chirkova. Efficiently querying large XML data repositories: a survey. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2007. Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. SIGMOD, 2003. Google ScholarDigital Library
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD, 1996. Google ScholarDigital Library
- H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. SIGMOD, 2007. Google ScholarDigital Library
- V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. VLDB, 2002. Google ScholarDigital Library
- V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB, 2005. Google ScholarDigital Library
- W.-S. Li, K. S. Candan, Q. Vu, and D. Agrawal. Retrieving and organizing web pages by information unit. WWW, 2001. Google ScholarDigital Library
- Oracle. Oracle Berkeley DB Java Edition (Version 3.2.44). http://www.oracle.com/technology/products/berkeley-db/ je/index.html.Google Scholar
- Y. Qi, K. S. Candan, and M. L. Sapino. Sum-Max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. VLDB, 2007. Google ScholarDigital Library
- G. Reich and P. Widmayer. Beyond Steiner?s problem: A VLSI oriented generalization. WG Workshop, 1989. Google ScholarDigital Library
- W3C. XML path language (XPath) Version 1.0. http://www.w3.org/TR/xpath, 1999.Google Scholar
- W3C. Resource Description framework (RDF). http://www.w3.org/RDF/, 2004.Google Scholar
- H.Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. ICDE, 2006. Google ScholarDigital Library
Index Terms
- Efficient algorithms for exact ranked twig-pattern matching over graphs
Recommendations
Twig Pattern Matching with Positional Predicates in XML Queries
WISA '13: Proceedings of the 2013 10th Web Information System and Application ConferenceTwig pattern matching is core operation in XML query processing. Although holistic twig pattern matching algorithms have been studied intensively nowadays, little attention is paid on positional predicates defined on navigation axis. These algorithms ...
Twig pattern matching running on XML streams
APWeb'12: Proceedings of the 14th international conference on Web Technologies and ApplicationsTwig pattern matching plays an important role in XML query processing, holistic twig pattern matching algorithms have been proposed and are considered to be effective since they avoid producing large number of intermediate results. Meanwhile, automaton-...
Exact Perfect Matching in Complete Graphs
A red-blue graph is a graph where every edge is colored either red or blue. The exact perfect matching problem asks for a perfect matching in a red-blue graph that has exactly a given number of red edges. We show that for complete and bipartite complete ...
Comments