skip to main content
research-article

Efficient indexing and querying over syntactically annotated trees

Published:01 July 2012Publication History
Skip Abstract Section

Abstract

Natural language text corpora are often available as sets of syntactically parsed trees. A wide range of expressive tree queries are possible over such parsed trees that open a new avenue in searching over natural language text. They not only allow for querying roles and relationships within sentences, but also improve search effectiveness compared to flat keyword queries. One major drawback of current systems supporting querying over parsed text is the performance of evaluating queries over large data. In this paper we propose a novel indexing scheme over unique subtrees as index keys. We also propose a novel root-split coding scheme that stores subtree structural information only partially, thus reducing index size and improving querying performance. Our extensive set of experiments show that root-split coding reduces the index size of any interval coding which stores individual node numbers by a factor of 50% to 80%, depending on the sizes of subtrees indexed. Moreover, We show that our index using root-split coding, outperforms previous approaches by at least an order of magnitude in terms of the response time of queries.

References

  1. S. Al-Khalifa, H. Jagadish, N. Koudas, J. Patel, D. Srivastava, and Y. Wu. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In ICDE, pages 141--152, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. The aquaint corpus of english news text, 2002. http://www.ldc.upenn.edu/Catalog/docs/LDC2002T31/.Google ScholarGoogle Scholar
  3. M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open Information Extraction from the Web. In IJCAI, pages 2670--2676, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bird, Y. Chen, S. Davidson, H. Lee, and Y. Zheng. Designing and Evaluating an XPath Dialect for Linguistic Queries. In ICDE, pages 52--52, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Bruno, N. Koudas, and D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. In SIGMOD, pages 310--321, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Christensen, Mausam, S. Soderland, and E. Etzioni. Semantic Role Labeling for Open Information Extraction. In NAACL, pages 52--60, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Chubak. Indexing and Querying Natural Language Text. PhD thesis, University of Alberta, 2012.Google ScholarGoogle Scholar
  8. G. Gou and R. Chirkova. Efficiently Querying Large XML Data Repositories: A Survey. Transactions on Knowledge and Data Engineering, 19(10):1381--1403, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Klein and C. Manning. Accurate Unlexicalized Parsing. In ACL, pages 423--430, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Kwok, O. Etzioni, and D. Weld. Scaling Question Answering to the Web. Transactions on Information Systems, 19(3):242--262, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Moldovan, S. Harabagiu, R. Girju, P. Morarescu, F. Lacatusu, A. Novischi, A. Badulescu, and O. Bolohan. LCC Tools for Question Answering. In TREC, pages 388--397, 2002.Google ScholarGoogle Scholar
  12. G. Pass, A. Chowdhury, and C. Torgeson. A Picture of Search. In INFOSCALE, page 1, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Randall. CorpusSearch Users Manual. University of Pennsylvania, 2000. http://www.ling.upenn.edu/~dringe/CorpStuff/Manual/Contents.html.Google ScholarGoogle Scholar
  14. D. Rohde. TGrep2 User Manual version 1.15. Massachusetts Institute of Technology. http://tedlab.mit.edu/dr/Tgrep2, 2005.Google ScholarGoogle Scholar
  15. D. Shasha, J. Wang, and R. Giugno. Algorithmics and Applications of Tree and Graph Searching. In PODS, pages 39--52, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Shasha, J. Wang, H. Shan, and K. Zhang. Atreegrep: Approximate Searching in Unordered Trees. In SSDBM, pages 89--98, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Williams, J. Huan, and W. Wang. Graph Database Indexing Using Structured Graph Decomposition. In ICDE, pages 976--985, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. F. Wu and D. Weld. Open Information Extraction Using Wikipedia. In ACL, pages 118--127, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Yan, P. Yu, and J. Han. Graph Indexing: A Frequent Structure-Based Approach. In SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura. XRel: a Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases. Transactions on Internet Technology, 1(1):110--141, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Yue. A Simple Proof of the Inequality FFD(L) ≤ 11/9 OPT(L)+1, ∀ L for the FFD Bin-Packing Algorithm. Acta Mathematicae Applicatae Sinica (English Series), 7(4):321--331, 1991.Google ScholarGoogle Scholar
  22. C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In SIGMOD, pages 425--436, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Zhang, M. Hu, and J. Yang. TreePi: A Novel Graph Indexing Method. In ICDE, pages 966--975, 2007.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 5, Issue 11
    July 2012
    608 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2012
    Published in pvldb Volume 5, Issue 11

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader