skip to main content
10.1145/375663.375722acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

On supporting containment queries in relational database management systems

Authors Info & Claims
Published:01 May 2001Publication History

ABSTRACT

Virtually all proposals for querying XML include a class of query we term “containment queries”. It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The inverted list technology that underlies much of Information Retrieval is well-suited to these queries, but should we implement this technology (a) in a separate loosely-coupled IR engine, or (b) using the native tables and query execution machinery of the RDBMS? With option (b), more than twenty years of work on RDBMS query optimization, query execution, scalability, and concurrency control and recovery immediately extend to the queries and structures that implement these new operations. But all this will be irrelevant if the performance of option (b) lags that of (a) by too much. In this paper, we explore some performance implications of both options using native implementations in two commercial relational database systems and in a special purpose inverted list engine. Our performance study shows that while RDBMSs are generally poorly suited for such queries, under conditions they can outperform an inverted list engine. Our analysis further identifies two significant causes that differentiate the performance of the IR and RDBMS implementations: the join algorithms employed and the hardware cache utilization. Our results suggest that contrary to most expectations, with some modifications, a native implementations in an RDBMS can support this class of query much more efficiently.

References

  1. 1.S. Abiteboul, D. Quass, J. McHuge, J. Widom, and J. Wiener. The lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68-88, April 1997.Google ScholarGoogle ScholarCross RefCross Ref
  2. 2.A. Ailamaki, D.J.DeWitt, M.D.Hill, and D.A.Wood. Dbmss on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, pages 266-277, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Ricardo Baeza-Yates and Gonzalo Navarro. Integrating contents and structure in text retrieval. SIGMOD Record, 25(1):67-69, March 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.G. E. Blake, M. P. Consens, P. Kilpelainen, P.-A. Larson, T. Snider, and F. W. Tompa. Text/relational database management systems: Harmonizing sql and sgml. In Proceedings of the International Conference on Applications of Databases, pages 267-280, June 1994.Google ScholarGoogle ScholarCross RefCross Ref
  5. 5.Don Chamberlin, Peter Frankhauser, Massimo Marchiori, and Jonathan Robie. Xml query requirements. In W3C Working Draft, http://www.w3.org/TR/xmlquery-req, August 2000.Google ScholarGoogle Scholar
  6. 6.Don Chamberlin, Jonathan Robie, and Daniela Florescu. Quilt: An xml query language for heterogeneous data sources. In Lecture Notes in Computer Science, Springer-Verlag, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.C. L. Clarke, G. V. Cormack, and F. J. Burkowski. An algebra for structured text search and a framework for its implementation. The Computer Journal, 38(1):43-56, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.C.L.Clarke, G.V.Cormack, and F.J.Burkowski. Schema-independent retrieval from heterogeneous structured text. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 279-289, 1995.Google ScholarGoogle Scholar
  9. 9.Robin Cover. The xml cover pages. In http://www.oasis-open.org/cover/xml.html, July 2000.Google ScholarGoogle Scholar
  10. 10.Tuong Dao, Ron Sacks-Davis, and James A. Thom. Indexing structured text for queries on containment relationships. In Proceedings of the 7th Australasian Database Conference, 1996.Google ScholarGoogle Scholar
  11. 11.Stefan Dessloch and Nelson Mattos. Integrating sql databases with content-specific search engines. In Proceedings of the 23rd VLDB Conference, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A query language for xml. In Proceedings of Eighth International World Wide Web Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Daniela Florescu and Donald Kossman. Storing and querying xml data using an rdbms. IEEE Data Engineering Bulletin, 22(3):27-34, 1999.Google ScholarGoogle Scholar
  14. 14.Daniela Florescu, Donald Kossmann, and Ioana Manolescu. Integrating keyword search into xml query processing. WWW9/Computer Networks, 33(1-6):119-135, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.International Organization for Standardization. Information processing-text and office systems-standard generalised markup language (sgml), iso/iec 8879, 1986.Google ScholarGoogle Scholar
  16. 16.GMD. Gmd-ipsi xql engine. http://xml.darmstadt.gmd.de/xql/index.html, August 1999.Google ScholarGoogle Scholar
  17. 17.Roy Goldman and Jennifer Widom. Wsq/dsq: A practical approach for combined querying of databases and the web. In Proceedings of the 2000 Sigmod Conference, pages 285-296, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.IBM. Db2 text extender. http://www- 4.ibm.com/software/data/db2/extenders/text.htm, July 2000.Google ScholarGoogle Scholar
  19. 19.INRIA. Minixyleme project. http://wwwrocq.inria.fr/caguilera/xoql/minixyleme/readme.html.Google ScholarGoogle Scholar
  20. 20.Intel. Intel architecture software developer's manual, volume 1: Basic architecture, 1999.Google ScholarGoogle Scholar
  21. 21.J.Clark and S.DeRose. Xml path language (xpath) version 1.0, w3c recommendation. In http://www.w3.org/TR/xpath.html, November 1999.Google ScholarGoogle Scholar
  22. 22.J.McHugh, J.Widom, S.Abiteboul, Q.Luo, and A.Rajaraman. Indexing semistructured data. In Stanford Technical Report, January 1998.Google ScholarGoogle Scholar
  23. 23.J.McHugh, S.Abiteboul, R.Goldman, D.Quass, and J.Widom. Lore: A database management system for semistructured data. SIGMOD Record, 26(3):54-66, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.J.Robie, J.Lapp, and D.Schach. Xml query language (xql). http://www.w3.org/TandS/QL/QL98/pp/xql.html.Google ScholarGoogle Scholar
  25. 25.University of Washington. The tukwila data integration system. http://data.cs.washington.edu/integration/tukwila/.Google ScholarGoogle Scholar
  26. 26.University of Wisconsin. The niagara system. http://www.cs.wisc.edu/niagara/.Google ScholarGoogle Scholar
  27. 27.Oracle. Oracle8i intermedia text reference, release 8.1.5. http://oradoc.photo.net/ora81/DOC/inter.815/a67843/toc.htm.Google ScholarGoogle Scholar
  28. 28.Steve Putz. Using a relational database for an inverted text index. In Xerox Palo Alto Research Center Technical Report SSL-91-20, Xerox PARC, January 1991.Google ScholarGoogle Scholar
  29. 29.Jun Rao and Kenneth A. Ross. Making b+-trees cache conscious in main memory. In SIGMOD Conference, pages 475-486, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.Ron Sacks-Davis, Timothy Arnold-Moore, and Justin Zobel. Database systems for structured documents. In Proceedings of the International Symposium on Advanced Database Technologies and Their Integration (ADTI'94), pages 272-283, October 1994.Google ScholarGoogle Scholar
  31. 31.Arnaud Sahuguet. Kweelt. http://db.cis.upenn.edu/Kweelt/.Google ScholarGoogle Scholar
  32. 32.G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.S.Cluet, S.Jacqmin, and J.Simeon. The new yatl: Design and specifications. In Technical Report, INRIA, 1999.Google ScholarGoogle Scholar
  34. 34.Jayavel Shanmugasundaram, He Gang, Kristin Tufte, Chun Zhang, David DeWitt, and Jeffrey Naughton. Relational databases for querying xml documents: Limitations and opportunities. In Proceedings of the 1999 VLDB Conference, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.Jayavel Shanmugasundaram, E. Shekita, R.Barr, Michael J. Carey, Bruce G. Lindsay, Hamid Pirahesh, and Berthold Reinwald. Efficiently publishing relational data as xml documents. In VLDB Conference, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. 36.A. Shatdal, C. Kant, and J.F.Naughton. Cache conscious algorithms for relational query processing. In Proceedings of the 20th VLDB Conference, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.Sleepycat Software. The berkeley database. http://www.sleepycat.com.Google ScholarGoogle Scholar
  38. 38.T.Arnold-Moore, M.Fuller, B.Lowe, J.Thom, and R.Wilkinson. The elf data model and sgql query language for structured document databases. In Proceedings of the Australasian Database Conference, Adelaide, Australia, pages 17-26, 1995.Google ScholarGoogle Scholar
  39. 39.Tak W. Yan and Jurgen Annevelink. Integrating a structured-text retrieval system with an object-oriented database system. In VLDB Conference, September 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40.Chun Zhang, Jeffrey Naughton, David DeWitt, Qiong Luo, and Guy Lohman. On supporting containment queries in relational database management systems. full version. http://www.cs.wisc.edu/niagara/papers/ZND+01full.pdf, 2001.Google ScholarGoogle Scholar
  41. 41.G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading MA, 1949.Google ScholarGoogle Scholar

Index Terms

  1. On supporting containment queries in relational database management systems

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
                May 2001
                630 pages
                ISBN:1581133324
                DOI:10.1145/375663

                Copyright © 2001 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 May 2001

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader