2012 | OriginalPaper | Buchkapitel
Computing q-Gram Non-overlapping Frequencies on SLP Compressed Texts
verfasst von : Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Erschienen in: SOFSEM 2012: Theory and Practice of Computer Science
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Length-
q
substrings, or
q
-grams, can represent important characteristics of text data, and determining the frequencies of all
q
-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the
non-overlapping frequencies
of all
q
-grams in a text given in compressed form, namely, as a straight line program (SLP). We show that the problem can be solved in
O
(
q
2
n
) time and
O
(
qn
) space where
n
is the size of the SLP. This generalizes and greatly improves previous work (Inenaga & Bannai, 2009) which solved the problem only for
q
= 2 in
O
(
n
4
log
n
) time and
O
(
n
3
) space.