2013 | OriginalPaper | Chapter
Fully-Online Grammar Compression
Authors : Shirou Maruyama, Yasuo Tabei, Hiroshi Sakamoto, Kunihiko Sadakane
Published in: String Processing and Information Retrieval
Publisher: Springer International Publishing
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
We present a fully-online algorithm for constructing straight-line programs (SLPs). A naive array representation of an SLP with
n
variables on an alphabet of size
σ
requires
$2n\lg(n+\sigma)$
bits. As already shown in [Tabei et al., CPM’13], in offline setting, this size can be reduced to
$ n\lg(n+\sigma) + 2n + o(n)$
, which is asymptotically equal to the information-theoretic lower bound. Our algorithm achieves the same size in online setting, i.e., characters of an input string are given one by one to update the current SLP. With an auxiliary position array of size
$n\lg(N/n) + 3n + o(n)$
bits, our representation supports substring extractions in
O
((
m
+
h
)
t
) time where
N
is the length of the input string,
m
is the length of a substring extracted,
$h = O(\lg N)$
is the height of the SLP,
t
=
O
(1) in offline case, and
$t=O(\lg n/\lg\lg n)$
in online case. The working space is bounded by
$(1+\alpha)n\lg(n+\sigma)+n(3+\lg(\alpha n))$
bits depending on a constant
α
∈ (0,1], which is a load factor of hash tables. We compared our algorithm to LZend in experiments using real world repetitive texts.