ABSTRACT
Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, the research on compressors for XML data has been conducted. However, some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Thus, the query performance on compressed XML data is degraded.In this paper, we propose XPRESS, an XML compressor which supports direct and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method, called reverse arithmetic encoding, which is intended for encoding label paths of XML data, and applies diverse encoding methods depending on the types of data values. Experimental results with real life data sets show that XPRESS achieves significant improvements on query performance for compressed XML data and reasonable compression ratios. On the average, the query performance of XPRESS is 2.83 times better than that of an existing XML compressor and the compression ratio of XPRESS is 73%.
- A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In Proceedings of 27th International Conference on Very Large Data Bases, pages 591--600, September 2001.]] Google ScholarDigital Library
- Anonymous. http://www.cs.washington.edu/research/projects/xmltk/www/xmlproperties.html.]]Google Scholar
- S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. Working Draft, http://www.w3.org/TR/2002/WD-xquery-20020816, 16 August 2002.]]Google Scholar
- T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0. W3C Recommendation, http://www.w3.org/TR/REC-xml, 1998.]]Google Scholar
- C.-W. Chung, J.-K. Min, and K. Shim. APEX: An Adaptive Path Index for XML Data. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 121--132, June 2002.]] Google ScholarDigital Library
- J. Clark and S. DeRose. XML Path Language(XPath) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xpath, November 1999.]]Google Scholar
- R. Cover. The XML Cover Pages. http://www.oasis-open.org/cover/xml.html, 2001.]]Google Scholar
- M. F. Fernandez and D. Suciu. Optimizing Regular Path Expressions Using Graph Schemas. In Proceedings of the 14th International Conference on Data Engineering, pages 14--23, February 1998.]] Google ScholarDigital Library
- M. F. Fernandez, W. C. Tan, and D. Suciu. SilkRoute: trading between relations and XML. WWW9/Computer Networks, 33(1--6):723--745, June 2000.]] Google ScholarDigital Library
- D. Florescu and D. Kossman. Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin, 22(3):27--34, September 1999.]]Google Scholar
- R. Goldman and J. Widom. DataGuides: Enable Query Formulation and Optimization in Semistructured DataBases. In Proceedings of 23rd International Conference on Very Large Data Bases, pages 436--445, August 1997.]] Google ScholarDigital Library
- T. Grust. Accelerating XPath Location Steps. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 109--120, June 2002.]] Google ScholarDigital Library
- E. R. Harold. Long Baseball Examples from The XML Bible. ibiblio, http://www.ibiblio.org/xml/examples/baseball/.]]Google Scholar
- P. G. Howard and J. S. Vitter. Analysis of Arithmetic Coding for Data Compression. In Proceedings of the IEEE Data Compression Conference, pages 3--12, April 1991.]]Google ScholarCross Ref
- D. A. Huffman. A Method for the Construction of Minimum Redandancy Codes. In Proceedings of the Institute of Radio Engineers 40, pages 1098--1101, September 1952.]]Google Scholar
- H. Liefke and D. Suciu. XMill: An Efficient Compressor for XML Data. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 153--164, May 2000.]] Google ScholarDigital Library
- C.-W. Park, J.-K. Min, and C.-W. Chung. Structural Function Inlining Technique for Structurally Recursive XML Queries. In Proceedings of 28th International Conference on Very Large Data Bases, pages 83--94, August 2002.]]Google Scholar
- D. Salomon. Data Compression, the complete reference. Springer-Verlag New York, Inc, 1998.]] Google ScholarDigital Library
- J. Shanmugasundaram, E. J. Shekita, R. Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald. Efficiently Publishing Relational Data as XML Documents. In Proceedings of 26th International Conference on Very Large Data Bases, pages 65--76, September 2000.]] Google ScholarDigital Library
- C. E. Shannon. A Mathematical Theory of Communication. Bell Syst. Tech. J., 27:398--403, July 1948.]]Google ScholarCross Ref
- T. Shimura, M. Yoshikawa, and S. Uemura. Storing and Retrieval of XML Documents using Object-Relational Databases. In Proceedings of 10th International Conference, DEXA, pages 206--217, August 1999.]] Google ScholarDigital Library
- I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database System. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 204--215, June 2002.]] Google ScholarDigital Library
- P. M. Tolani and J. R. Haritsa. XGRIND: A Query-friendly XML Compressor. In Proceedings of 18th International Conference on Database Engineering, February 2002.]] Google ScholarDigital Library
- I. H. Witten, R. M. Neal, and J. G. Cleary. Arithmetic Coding for Data Compression. Communications of the ACM, 30(6):520--540, June 1987.]] Google ScholarDigital Library
Index Terms
- XPRESS: a queriable compression for XML data
Recommendations
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
XML-based information mediation with MIX
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataThe MIX mediator system, MIXm, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.1 MIXm uses XML as the common model for data exchange. Mediator views are expressed in XMAS (XML ...
Comments