Top

Published in:

2011 | OriginalPaper | Chapter

Faster Approximate Pattern Matching in Compressed Repetitive Texts

Authors : Travis Gagie, Paweł Gawrychowski, Simon J. Puglisi

Published in: Algorithms and Computation

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Motivated by the imminent growth of massive, highly redundant genomic databases we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the underlying string(s).

Bille et al. (2011) recently showed how, given a straight-line program with

rules for a string

of length

, we can build an

$\ensuremath{\mathcal{O}\!\left( {r} \right)}$

-word data structure that allows us to extract any substring

[

] in

$\ensuremath{\mathcal{O}\!\left( {\log n + j - i} \right)}$

time. They also showed how, given a pattern

of length

and an edit distance

≤

, their data structure supports finding all

occ

approximate matches to

$\ensuremath{\mathcal{O}\!\left( {r (\min (m k, k^4 + m) + \log n) + \ensuremath{\mathsf{occ}}} \right)}$

time. Rytter (2003) and Charikar et al. (2005) showed that

is always at least the number

of phrases in the LZ77 parse of

, and gave algorithms for building straight-line programs with

$\ensuremath{\mathcal{O}\!\left( {z \log n} \right)}$

rules. In this paper we give a simple

$\ensuremath{\mathcal{O}\!\left( {z \log n} \right)}$

-word data structure that takes the same time for substring extraction but only

$\ensuremath{\mathcal{O}\!\left( {z (\min (m k, k^4 + m)) + \ensuremath{\mathsf{occ}}} \right)}$

time for approximate pattern matching.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Parameterized Complexity of the Firefighter Problem

next chapter A New Algorithm for the Characteristic String Problem under Loose Similarity Criteria

Title: Faster Approximate Pattern Matching in Compressed Repetitive Texts
Authors: Travis Gagie
Paweł Gawrychowski
Simon J. Puglisi
Publisher: Springer Berlin Heidelberg
Book: Algorithms and Computation
Print ISBN: 978-3-642-25590-8

Electronic ISBN: 978-3-642-25591-5

Copyright Year: 2011
DOI: https://doi.org/10.1007/978-3-642-25591-5_67

Springer Professional

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner