2006 | OriginalPaper | Buchkapitel
Dotted Suffix Trees A Structure for Approximate Text Indexing
verfasst von : Luís Pedro Coelho, Arlindo L. Oliveira
Erschienen in: String Processing and Information Retrieval
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
In this work, the problem we address is text indexing for approximate matching. Given a text
$\mathcal{T}$
which undergoes some preprocessing to generate an index, we can later query this index to identify the places where a string occurs up to a certain number of errors
k
(edition distance). The indexing structure occupies space
$\mathcal{O}(n\log^kn)$
in the average case, independent of alphabet size. This structure can be used to report the existence of a match with
k
errors in
$\mathcal{O}(3^k m^{k+1})$
and to report the occurrences in
$\mathcal{O}(3^k m^{k+1} + \mbox{\it ed})$
time, where
m
is the length of the pattern and
ed
and the number of matching edit scripts. The construction of the structure has time bound by
$\mathcal{O}(kN|\Sigma|)$
, where
N
is the number of nodes in the index and |Σ| the alphabet size.