2008 | OriginalPaper | Buchkapitel
Coders for the Burrows-Wheeler Transform
Erschienen in: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching
Verlag: Springer US
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Like most transforms, the Burrows-Wheeler Transform does not change the size of the file that has been transformed, but merely rearranges it so that it will be easier to represent it compactly. It then needs to be coded using a second phase which we will refer to as the “Local to Global Transform” (LGT). Figure 3.1 shows a section of the transformed text for Shakespeare's “Hamlet”, which reveals the kind of regularities that the BWT exposes. These characters are ones that appear before the context nd; initially the nd is followed by a space, and hence a is very common, but then the character is followed by ndeed, where the i becomes common, and the last few characters precede nder.
Clearly the text in Figure 3.1 contains a lot of patterns, and therefore will be easy to compress. Many sophisticated techniques have been proposed to exploit the regularities of the BWT transformed text, and yet it has emerged that one of the simplest approaches (RleAc, based on run-length encoding and an order-zero arithmetic coder) gives the best compression and is also very fast compared with more complicated methods. We will begin this section by looking at this simple coder, but later we will also review various other approaches that have been proposed, including Burrows and Wheeler's original “Move to Front” (MTF) list, inversion frequencies, distance coding, frequency counting methods, wavelet trees, and alternative permutations. We will also consider the effect of the block size on compression performance.