skip to main content
article

An analysis of the Burrows—Wheeler transform

Published:01 May 2001Publication History
Skip Abstract Section

Abstract

The Burrows—Wheeler Transform (also known as Block-Sorting) is at the base of compression algorithms that are the state of the art in lossless data compression. In this paper, we analyze two algorithms that use this technique. The first one is the original algorithm described by Burrows and Wheeler, which, despite its simplicity outperforms the Gzip compressor. The second one uses an additional run-length encoding step to improve compression. We prove that the compression ratio of both algorithms can be bounded in terms of the kth order empirical entropy of the input string for any k ≥ 0. We make no assumptions on the input and we obtain bounds which hold in the worst case that is for every possible input string. All previous results for Block-Sorting algorithms were concerned with the average compression ratio and have been established assuming that the input comes from a finite-order Markov source.

References

  1. ARNOLD, R., AND BELL, T. 2000. The Canterbury corpus home page. http://corpus.canterbury. ac.nz.Google ScholarGoogle Scholar
  2. BENTLEY, J., SLEATOR, D., TARJAN, R., AND WEI, V. 1986. A locally adaptive data compression scheme. Commun. ACM 29, 4 (Apr.), 320-330. Google ScholarGoogle Scholar
  3. BURROWS, M., AND WHEELER, D. J. 1994. A block sorting lossless data compression algorithm. Tech. Rep. 124. Digital Equipment Corporation, Palo Alto, Calif.Google ScholarGoogle Scholar
  4. CLEARY,J.G.,AND TEAHAN, W. J. 1997. Unbounded length contexts for PPM. Comput. J. 40, 2/3, 67-75.Google ScholarGoogle Scholar
  5. CORMACK,G.V.,AND HORSPOOL, R. N. S. 1987. Data compression using dynamic Markov modelling. Comput. J. 30,6,541-550. Google ScholarGoogle Scholar
  6. EFFROS, M. 1999. Universal lossless source coding with the Burrows-Wheeler transform. In DCC: Data Compression Conference. IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarGoogle Scholar
  7. FENWICK, P. 1996a. Block sorting text compression-final report. Tech. Rep. 130, Dept. of Computer Science, The University of Auckland, New Zealand.Google ScholarGoogle Scholar
  8. FENWICK, P. 1996b. The Burrows-Wheeler transform for block sorting text compression: principles and improvements. Computer J. 39,9,731-740.Google ScholarGoogle Scholar
  9. FERRAGINA, P., AND MANZINI, G. 2000. Opportunistic data structures with applications. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (Redondo Beach, Calif.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 390-398. Google ScholarGoogle Scholar
  10. FERRAGINA, P., AND MANZINI, G. 2001. An experimental study of an opportunistic index. In Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms (Washington, D.C.), ACM, New York, 269-278. Google ScholarGoogle Scholar
  11. HOWARD, P., AND VITTER, J. 1992a. Analysis of arithmetic coding for data compression. Inf. Proc. Manage. 28,6,743-763. Google ScholarGoogle Scholar
  12. HOWARD, P., AND VITTER, J. 1992b. Practical implementations of arithmetic coding. In Image and Text Compression, J. A. Storer, ed. Kluwer Academic, pp. 85-112.Google ScholarGoogle Scholar
  13. HUFFMAN, D. A. 1952. A method for the construction of minimum redundancy codes. Proc. IRE 40 (Sept.), 1098-1101.Google ScholarGoogle Scholar
  14. KOSARAJU, R., AND MANZINI, G. 1999. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput. 29,3,893-911. Google ScholarGoogle Scholar
  15. LARSSON, N. J. 1998. The context trees of block sorting compression. In Proceedings of the IEEE Data Compression Conference (Mar.-Apr.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 189-198. Google ScholarGoogle Scholar
  16. MOFFAT, A. 1990. Implementing the PPM data compression scheme. IEEE Trans. Commun. COM-38, 1917-1921.Google ScholarGoogle Scholar
  17. MOFFAT, A., NEAL, R., AND WITTEN, I. 1995. Arithmetic coding revisited. In Data Compression Conference. IEEE Computer Society Press, Los Alamitos, Calif., pp. 202-211. Google ScholarGoogle Scholar
  18. NELSON, M. 1996. Data compression with the Burrows-Wheeler transform. Dr. Dobb's J. Softw. Tools 21,9,46-50, http://www.dogma.net/markn/articles/bwt/bwt.htm.Google ScholarGoogle Scholar
  19. RYABKO, B. Y. 1980. Data compression by means of a 'book stack'. Prob. Inf. Transm. 16,4, 265-269.Google ScholarGoogle Scholar
  20. SADAKANE, K. 1997. Text compression using recency rank with context and relation to context sorting, block sorting and PPM*. In Proceedings of the International Conference on Compression and Complexity of Sequences (SEQUENCES '97). IEEE Computer Society Press, Los Alamitos, Calif., 305-319. Google ScholarGoogle Scholar
  21. SADAKANE, K. 1998. On optimality of variants of the block sorting compression. In Data Compression Conference (Snowbird, Ut.). IEEE Computer Society Press, Los Alamitos, Calif., p. 570. Google ScholarGoogle Scholar
  22. SCHINDLER, M. 1997. A fast block-sorting algorithm for lossless data compression. In Data Compression Conference. IEEE Computer Society Press, Los Alamitos, Calif. http://www.compressconsult. com/szip/. Google ScholarGoogle Scholar
  23. SEWARD, J. 1997. The BZIP2 home page. http://sourceware.cygnus.com/bzip2/in-dex. html.Google ScholarGoogle Scholar
  24. VITTER, J. 1987. Design and analysis of dynamic Huffman codes. J. ACM 34, 4 (Oct.), 825- 845. Google ScholarGoogle Scholar
  25. WHEELER, D. 1995. An implementation of block coding. Computer Laboratory. Cambridge University, Cambridge, UK, ftp://ftp.cl.cam.ac.uk/users/djw3/bred.ps.Google ScholarGoogle Scholar
  26. WHEELER, D. 1997. Upgrading bred with multiples tables. Computer Laboratory. Cambridge University, Cambridge, UK, ftp://ftp.cl.cam.ac.uk/users/djw3/bred3.ps.Google ScholarGoogle Scholar
  27. WITTEN, I., NEAL, R., AND CLEARY, J. 1987. Arithmetic coding for data compression. Commun. ACM 30, 6 (June), 520-540. Google ScholarGoogle Scholar

Index Terms

  1. An analysis of the Burrows—Wheeler transform

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader