2013 | OriginalPaper | Buchkapitel
External Memory Generalized Suffix and LCP Arrays Construction
verfasst von : Felipe A. Louza, Guilherme P. Telles, Cristina Dutra De Aguiar Ciferri
Erschienen in: Combinatorial Pattern Matching
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
A suffix array is a data structure that, together with the LCP array, allows solving many string processing problems in a very efficient fashion. In this article we introduce eGSA, the first external memory algorithm to construct both generalized suffix and LCP arrays for sets of strings. Our algorithm relies on a combination of buffers, induced sorting and a heap. Performance tests with real DNA sequence sets of size up to 8.5 GB showed that eGSA can indeed be applied to sets of large sequences with efficient running time on a low-cost machine. Compared to the algorithm that most closely resembles eGSA purpose, eSAIS, eGSA reduced the time spent to construct the arrays by a factor of 2.5−4.8.