skip to main content
10.1145/872757.872775acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

XPRESS: a queriable compression for XML data

Authors Info & Claims
Published:09 June 2003Publication History

ABSTRACT

Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, the research on compressors for XML data has been conducted. However, some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Thus, the query performance on compressed XML data is degraded.In this paper, we propose XPRESS, an XML compressor which supports direct and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method, called reverse arithmetic encoding, which is intended for encoding label paths of XML data, and applies diverse encoding methods depending on the types of data values. Experimental results with real life data sets show that XPRESS achieves significant improvements on query performance for compressed XML data and reasonable compression ratios. On the average, the query performance of XPRESS is 2.83 times better than that of an existing XML compressor and the compression ratio of XPRESS is 73%.

References

  1. A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In Proceedings of 27th International Conference on Very Large Data Bases, pages 591--600, September 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anonymous. http://www.cs.washington.edu/research/projects/xmltk/www/xmlproperties.html.]]Google ScholarGoogle Scholar
  3. S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. Working Draft, http://www.w3.org/TR/2002/WD-xquery-20020816, 16 August 2002.]]Google ScholarGoogle Scholar
  4. T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0. W3C Recommendation, http://www.w3.org/TR/REC-xml, 1998.]]Google ScholarGoogle Scholar
  5. C.-W. Chung, J.-K. Min, and K. Shim. APEX: An Adaptive Path Index for XML Data. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 121--132, June 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Clark and S. DeRose. XML Path Language(XPath) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xpath, November 1999.]]Google ScholarGoogle Scholar
  7. R. Cover. The XML Cover Pages. http://www.oasis-open.org/cover/xml.html, 2001.]]Google ScholarGoogle Scholar
  8. M. F. Fernandez and D. Suciu. Optimizing Regular Path Expressions Using Graph Schemas. In Proceedings of the 14th International Conference on Data Engineering, pages 14--23, February 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. F. Fernandez, W. C. Tan, and D. Suciu. SilkRoute: trading between relations and XML. WWW9/Computer Networks, 33(1--6):723--745, June 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Florescu and D. Kossman. Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin, 22(3):27--34, September 1999.]]Google ScholarGoogle Scholar
  11. R. Goldman and J. Widom. DataGuides: Enable Query Formulation and Optimization in Semistructured DataBases. In Proceedings of 23rd International Conference on Very Large Data Bases, pages 436--445, August 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Grust. Accelerating XPath Location Steps. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 109--120, June 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. R. Harold. Long Baseball Examples from The XML Bible. ibiblio, http://www.ibiblio.org/xml/examples/baseball/.]]Google ScholarGoogle Scholar
  14. P. G. Howard and J. S. Vitter. Analysis of Arithmetic Coding for Data Compression. In Proceedings of the IEEE Data Compression Conference, pages 3--12, April 1991.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. D. A. Huffman. A Method for the Construction of Minimum Redandancy Codes. In Proceedings of the Institute of Radio Engineers 40, pages 1098--1101, September 1952.]]Google ScholarGoogle Scholar
  16. H. Liefke and D. Suciu. XMill: An Efficient Compressor for XML Data. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 153--164, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C.-W. Park, J.-K. Min, and C.-W. Chung. Structural Function Inlining Technique for Structurally Recursive XML Queries. In Proceedings of 28th International Conference on Very Large Data Bases, pages 83--94, August 2002.]]Google ScholarGoogle Scholar
  18. D. Salomon. Data Compression, the complete reference. Springer-Verlag New York, Inc, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Shanmugasundaram, E. J. Shekita, R. Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald. Efficiently Publishing Relational Data as XML Documents. In Proceedings of 26th International Conference on Very Large Data Bases, pages 65--76, September 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. E. Shannon. A Mathematical Theory of Communication. Bell Syst. Tech. J., 27:398--403, July 1948.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. T. Shimura, M. Yoshikawa, and S. Uemura. Storing and Retrieval of XML Documents using Object-Relational Databases. In Proceedings of 10th International Conference, DEXA, pages 206--217, August 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database System. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 204--215, June 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. M. Tolani and J. R. Haritsa. XGRIND: A Query-friendly XML Compressor. In Proceedings of 18th International Conference on Database Engineering, February 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. H. Witten, R. M. Neal, and J. G. Cleary. Arithmetic Coding for Data Compression. Communications of the ACM, 30(6):520--540, June 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XPRESS: a queriable compression for XML data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data
            June 2003
            702 pages
            ISBN:158113634X
            DOI:10.1145/872757

            Copyright © 2003 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 June 2003

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            SIGMOD '03 Paper Acceptance Rate53of342submissions,15%Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader