ABSTRACT
Virtually all proposals for querying XML include a class of query we term “containment queries”. It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The inverted list technology that underlies much of Information Retrieval is well-suited to these queries, but should we implement this technology (a) in a separate loosely-coupled IR engine, or (b) using the native tables and query execution machinery of the RDBMS? With option (b), more than twenty years of work on RDBMS query optimization, query execution, scalability, and concurrency control and recovery immediately extend to the queries and structures that implement these new operations. But all this will be irrelevant if the performance of option (b) lags that of (a) by too much. In this paper, we explore some performance implications of both options using native implementations in two commercial relational database systems and in a special purpose inverted list engine. Our performance study shows that while RDBMSs are generally poorly suited for such queries, under conditions they can outperform an inverted list engine. Our analysis further identifies two significant causes that differentiate the performance of the IR and RDBMS implementations: the join algorithms employed and the hardware cache utilization. Our results suggest that contrary to most expectations, with some modifications, a native implementations in an RDBMS can support this class of query much more efficiently.
- 1.S. Abiteboul, D. Quass, J. McHuge, J. Widom, and J. Wiener. The lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68-88, April 1997.Google ScholarCross Ref
- 2.A. Ailamaki, D.J.DeWitt, M.D.Hill, and D.A.Wood. Dbmss on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, pages 266-277, September 1999. Google ScholarDigital Library
- 3.Ricardo Baeza-Yates and Gonzalo Navarro. Integrating contents and structure in text retrieval. SIGMOD Record, 25(1):67-69, March 1996. Google ScholarDigital Library
- 4.G. E. Blake, M. P. Consens, P. Kilpelainen, P.-A. Larson, T. Snider, and F. W. Tompa. Text/relational database management systems: Harmonizing sql and sgml. In Proceedings of the International Conference on Applications of Databases, pages 267-280, June 1994.Google ScholarCross Ref
- 5.Don Chamberlin, Peter Frankhauser, Massimo Marchiori, and Jonathan Robie. Xml query requirements. In W3C Working Draft, http://www.w3.org/TR/xmlquery-req, August 2000.Google Scholar
- 6.Don Chamberlin, Jonathan Robie, and Daniela Florescu. Quilt: An xml query language for heterogeneous data sources. In Lecture Notes in Computer Science, Springer-Verlag, 2000. Google ScholarDigital Library
- 7.C. L. Clarke, G. V. Cormack, and F. J. Burkowski. An algebra for structured text search and a framework for its implementation. The Computer Journal, 38(1):43-56, 1995.Google ScholarDigital Library
- 8.C.L.Clarke, G.V.Cormack, and F.J.Burkowski. Schema-independent retrieval from heterogeneous structured text. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 279-289, 1995.Google Scholar
- 9.Robin Cover. The xml cover pages. In http://www.oasis-open.org/cover/xml.html, July 2000.Google Scholar
- 10.Tuong Dao, Ron Sacks-Davis, and James A. Thom. Indexing structured text for queries on containment relationships. In Proceedings of the 7th Australasian Database Conference, 1996.Google Scholar
- 11.Stefan Dessloch and Nelson Mattos. Integrating sql databases with content-specific search engines. In Proceedings of the 23rd VLDB Conference, 1997. Google ScholarDigital Library
- 12.Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A query language for xml. In Proceedings of Eighth International World Wide Web Conference, 1999. Google ScholarDigital Library
- 13.Daniela Florescu and Donald Kossman. Storing and querying xml data using an rdbms. IEEE Data Engineering Bulletin, 22(3):27-34, 1999.Google Scholar
- 14.Daniela Florescu, Donald Kossmann, and Ioana Manolescu. Integrating keyword search into xml query processing. WWW9/Computer Networks, 33(1-6):119-135, 2000. Google ScholarDigital Library
- 15.International Organization for Standardization. Information processing-text and office systems-standard generalised markup language (sgml), iso/iec 8879, 1986.Google Scholar
- 16.GMD. Gmd-ipsi xql engine. http://xml.darmstadt.gmd.de/xql/index.html, August 1999.Google Scholar
- 17.Roy Goldman and Jennifer Widom. Wsq/dsq: A practical approach for combined querying of databases and the web. In Proceedings of the 2000 Sigmod Conference, pages 285-296, 2000. Google ScholarDigital Library
- 18.IBM. Db2 text extender. http://www- 4.ibm.com/software/data/db2/extenders/text.htm, July 2000.Google Scholar
- 19.INRIA. Minixyleme project. http://wwwrocq.inria.fr/caguilera/xoql/minixyleme/readme.html.Google Scholar
- 20.Intel. Intel architecture software developer's manual, volume 1: Basic architecture, 1999.Google Scholar
- 21.J.Clark and S.DeRose. Xml path language (xpath) version 1.0, w3c recommendation. In http://www.w3.org/TR/xpath.html, November 1999.Google Scholar
- 22.J.McHugh, J.Widom, S.Abiteboul, Q.Luo, and A.Rajaraman. Indexing semistructured data. In Stanford Technical Report, January 1998.Google Scholar
- 23.J.McHugh, S.Abiteboul, R.Goldman, D.Quass, and J.Widom. Lore: A database management system for semistructured data. SIGMOD Record, 26(3):54-66, 1997. Google ScholarDigital Library
- 24.J.Robie, J.Lapp, and D.Schach. Xml query language (xql). http://www.w3.org/TandS/QL/QL98/pp/xql.html.Google Scholar
- 25.University of Washington. The tukwila data integration system. http://data.cs.washington.edu/integration/tukwila/.Google Scholar
- 26.University of Wisconsin. The niagara system. http://www.cs.wisc.edu/niagara/.Google Scholar
- 27.Oracle. Oracle8i intermedia text reference, release 8.1.5. http://oradoc.photo.net/ora81/DOC/inter.815/a67843/toc.htm.Google Scholar
- 28.Steve Putz. Using a relational database for an inverted text index. In Xerox Palo Alto Research Center Technical Report SSL-91-20, Xerox PARC, January 1991.Google Scholar
- 29.Jun Rao and Kenneth A. Ross. Making b+-trees cache conscious in main memory. In SIGMOD Conference, pages 475-486, 2000. Google ScholarDigital Library
- 30.Ron Sacks-Davis, Timothy Arnold-Moore, and Justin Zobel. Database systems for structured documents. In Proceedings of the International Symposium on Advanced Database Technologies and Their Integration (ADTI'94), pages 272-283, October 1994.Google Scholar
- 31.Arnaud Sahuguet. Kweelt. http://db.cis.upenn.edu/Kweelt/.Google Scholar
- 32.G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarDigital Library
- 33.S.Cluet, S.Jacqmin, and J.Simeon. The new yatl: Design and specifications. In Technical Report, INRIA, 1999.Google Scholar
- 34.Jayavel Shanmugasundaram, He Gang, Kristin Tufte, Chun Zhang, David DeWitt, and Jeffrey Naughton. Relational databases for querying xml documents: Limitations and opportunities. In Proceedings of the 1999 VLDB Conference, September 1999. Google ScholarDigital Library
- 35.Jayavel Shanmugasundaram, E. Shekita, R.Barr, Michael J. Carey, Bruce G. Lindsay, Hamid Pirahesh, and Berthold Reinwald. Efficiently publishing relational data as xml documents. In VLDB Conference, September 2000. Google ScholarDigital Library
- 36.A. Shatdal, C. Kant, and J.F.Naughton. Cache conscious algorithms for relational query processing. In Proceedings of the 20th VLDB Conference, 1994. Google ScholarDigital Library
- 37.Sleepycat Software. The berkeley database. http://www.sleepycat.com.Google Scholar
- 38.T.Arnold-Moore, M.Fuller, B.Lowe, J.Thom, and R.Wilkinson. The elf data model and sgql query language for structured document databases. In Proceedings of the Australasian Database Conference, Adelaide, Australia, pages 17-26, 1995.Google Scholar
- 39.Tak W. Yan and Jurgen Annevelink. Integrating a structured-text retrieval system with an object-oriented database system. In VLDB Conference, September 1994. Google ScholarDigital Library
- 40.Chun Zhang, Jeffrey Naughton, David DeWitt, Qiong Luo, and Guy Lohman. On supporting containment queries in relational database management systems. full version. http://www.cs.wisc.edu/niagara/papers/ZND+01full.pdf, 2001.Google Scholar
- 41.G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading MA, 1949.Google Scholar
Index Terms
- On supporting containment queries in relational database management systems
Recommendations
On supporting containment queries in relational database management systems
Virtually all proposals for querying XML include a class of query we term “containment queries”. It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of ...
Decidable containment of recursive queries
Database theoryOne of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query ...
Containment of nested XML queries
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Query containment is the most fundamental relationship between a pair of database queries: a query Q is said to be contained in a query Q′ if the answer for Q is always a subset of the answer for Q′, independent of the current state of the database. ...
Comments