2012 | OriginalPaper | Buchkapitel
Speeding Up q-Gram Mining on Grammar-Based Compressed Texts
verfasst von : Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Erschienen in: Combinatorial Pattern Matching
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
We present an efficient algorithm for calculating
q
-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP
$\mathcal{T}$
of size
n
that represents string
T
, the algorithm computes the occurrence frequencies of
all
q
-grams in
T
, by reducing the problem to the weighted
q
-gram frequencies problem on a trie-like structure of size
$m = |T|-\mathit{dup}(q,\mathcal{T})$
, where
$\mathit{dup}(q,\mathcal{T})$
is a quantity that represents the amount of redundancy that the SLP captures with respect to
q
-grams. The reduced problem can be solved in linear time. Since
m
=
O
(
qn
), the running time of our algorithm is
$O(\min\{|T|-\mathit{dup}(q,\mathcal{T}),qn\})$
, improving our previous
O
(
qn
) algorithm when
q
= Ω(|
T
|/
n
).