Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

1. Algorithms for Indexing Highly Similar DNA Sequences

verfasst von : Nadia Ben Nsira, Thierry Lecroq, Mourad Elloumi

Erschienen in: Algorithms for Next-Generation Sequencing Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The availability of numerical data grows from one day to the other in an extraordinary way. This is the case for DNA sequences produced by new technologies of high-throughput Next Generation Sequencing (NGS). Hence, it is possible to sequence several genomes of organisms and a project (http://​www.​1000genomes.​org) now provide about 2500 individual human genomes (sequences of more than three billion characters (A, C, G, T).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
In this exposition all logarithms are in base 2 unless stated otherwise.
 
Literatur
1.
2.
Zurück zum Zitat Alatabbi, A., Barton, C., Iliopoulos, C.S., Mouchard, L.: Querying highly similar structured sequences via binary encoding and word level operations. In: Iliadis, L.S., Maglogiannis, I., Papadopoulos, H., Karatzas, K., Sioutas, S. (eds.) Proceedings of the International Workshop on Artificial Intelligence Applications and Innovations, AIAI 2012, Part II. IFIP Advances in Information and Communication Technology, vol. 382, pp. 584–592. Springer, Cham (2012) Alatabbi, A., Barton, C., Iliopoulos, C.S., Mouchard, L.: Querying highly similar structured sequences via binary encoding and word level operations. In: Iliadis, L.S., Maglogiannis, I., Papadopoulos, H., Karatzas, K., Sioutas, S. (eds.) Proceedings of the International Workshop on Artificial Intelligence Applications and Innovations, AIAI 2012, Part II. IFIP Advances in Information and Communication Technology, vol. 382, pp. 584–592. Springer, Cham (2012)
3.
Zurück zum Zitat Apostolico, A.: The myriad virtues of subword trees. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO Advance Science Institute Series, vol. 12, pp. 85–96. Springer, Berlin (1985)CrossRef Apostolico, A.: The myriad virtues of subword trees. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO Advance Science Institute Series, vol. 12, pp. 85–96. Springer, Berlin (1985)CrossRef
4.
Zurück zum Zitat Arroyuelo, D., Navarro, G., Sadakane, K.: Reducing the space requirement of LZ-index. In: Lewenstein, M., Valiente, G. (eds.) Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching, CPM 2006, Barcelona. Lecture Notes in Computer Science, vol. 4009, pp. 318–329. Springer, Berlin (2006) Arroyuelo, D., Navarro, G., Sadakane, K.: Reducing the space requirement of LZ-index. In: Lewenstein, M., Valiente, G. (eds.) Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching, CPM 2006, Barcelona. Lecture Notes in Computer Science, vol. 4009, pp. 318–329. Springer, Berlin (2006)
5.
Zurück zum Zitat Bell, T., Cleary, J.G., Witten, I.H.: Text Compression. Prentice Hall, Upper Saddle River (1990) Bell, T., Cleary, J.G., Witten, I.H.: Text Compression. Prentice Hall, Upper Saddle River (1990)
6.
Zurück zum Zitat Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.-T., Seiferas, J.: The smallest automation recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)CrossRefMATH Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.-T., Seiferas, J.: The smallest automation recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)CrossRefMATH
7.
Zurück zum Zitat Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987)MathSciNetCrossRefMATH Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987)MathSciNetCrossRefMATH
8.
Zurück zum Zitat Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, Digital SRC Research (1994) Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, Digital SRC Research (1994)
9.
Zurück zum Zitat Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)MATH Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)MATH
10.
Zurück zum Zitat Crochemore, M., Lecroq, T.: Trie. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 3179–3182. Springer, Heidelberg (2009) Crochemore, M., Lecroq, T.: Trie. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 3179–3182. Springer, Heidelberg (2009)
11.
Zurück zum Zitat Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)MATH Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)MATH
12.
Zurück zum Zitat Crochemore, M., Vérin, R.: On compact directed acyclic word graphs. In: Mycielski, J., Rozenberg, G., Salomaa, A. (eds.) Structures in Logic and Computer Science. A Selection of Essays in Honor of Andrzej Ehrenfeucht. Lecture Notes in Computer Science, vol. 1261, pp. 192–211. Springer, Berlin (1997) Crochemore, M., Vérin, R.: On compact directed acyclic word graphs. In: Mycielski, J., Rozenberg, G., Salomaa, A. (eds.) Structures in Logic and Computer Science. A Selection of Essays in Honor of Andrzej Ehrenfeucht. Lecture Notes in Computer Science, vol. 1261, pp. 192–211. Springer, Berlin (1997)
13.
Zurück zum Zitat Do, H.H., Jansson, J., Sadakane, K., Sung, W.-K.: Fast relative Lempel-Ziv self-index for similar sequences. In: Snoeyink, J., Lu, P., Su, K., Wang, L. (eds.) Proceedings of the Joint International Conference on Frontiers in Algorithmics and Algorithmic Aspects in Information and Management, FAW-AAIM 2012, Beijing. Lecture Notes in Computer Science, vol. 7285, pp. 291–302. Springer, Berlin (2012) Do, H.H., Jansson, J., Sadakane, K., Sung, W.-K.: Fast relative Lempel-Ziv self-index for similar sequences. In: Snoeyink, J., Lu, P., Su, K., Wang, L. (eds.) Proceedings of the Joint International Conference on Frontiers in Algorithmics and Algorithmic Aspects in Information and Management, FAW-AAIM 2012, Beijing. Lecture Notes in Computer Science, vol. 7285, pp. 291–302. Springer, Berlin (2012)
14.
Zurück zum Zitat Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of the 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, Miami Beach, FL, pp. 137–143 (1997) Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of the 38th Annual Symposium on Foundations of Computer Science, FOCS 1997, Miami Beach, FL, pp. 137–143 (1997)
15.
Zurück zum Zitat Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)MathSciNetCrossRefMATH Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)MathSciNetCrossRefMATH
16.
Zurück zum Zitat Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, Redondo Beach, CA, pp. 390–398 (2000) Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, Redondo Beach, CA, pp. 390–398 (2000)
17.
Zurück zum Zitat Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2001, Washington, DC, pp. 269–278. Society for Industrial and Applied Mathematics, Philadelphia (2001) Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2001, Washington, DC, pp. 269–278. Society for Industrial and Applied Mathematics, Philadelphia (2001)
19.
Zurück zum Zitat Ferragina, P., Manzini, G., Veli, M., Navarro, G.: An alphabet-friendly fm-index. In: Apostolico, A., Melucci, M. (eds.) Proceedings of the 11th International Conference on String Processing and Information Retrieval, SPIRE 2004, Padova. Lecture Notes in Computer Science, vol. 3246, pp. 150–160. Springer, Berlin (2004) Ferragina, P., Manzini, G., Veli, M., Navarro, G.: An alphabet-friendly fm-index. In: Apostolico, A., Melucci, M. (eds.) Proceedings of the 11th International Conference on String Processing and Information Retrieval, SPIRE 2004, Padova. Lecture Notes in Computer Science, vol. 3246, pp. 150–160. Springer, Berlin (2004)
20.
Zurück zum Zitat Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms 3(2), 20 (2007)MathSciNetCrossRefMATH Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms 3(2), 20 (2007)MathSciNetCrossRefMATH
21.
Zurück zum Zitat Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In: Yao, F.F., Luks, E.M. (eds.) Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000, Portland, OR, pp. 397–406 (2000) Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In: Yao, F.F., Luks, E.M. (eds.) Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, STOC 2000, Portland, OR, pp. 397–406 (2000)
22.
Zurück zum Zitat Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)MathSciNetCrossRefMATH Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)MathSciNetCrossRefMATH
23.
Zurück zum Zitat Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, Baltimore, MD, pp. 841–850 (2003) Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, Baltimore, MD, pp. 841–850 (2003)
24.
Zurück zum Zitat Grossi, R., Gupta, A., Vitter, J.S.: When indexing equals compression: experiments with compressing suffix arrays and applications. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, LA, pp. 636–645. Society for Industrial and Applied Mathematics, Philadelphia (2004) Grossi, R., Gupta, A., Vitter, J.S.: When indexing equals compression: experiments with compressing suffix arrays and applications. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, LA, pp. 636–645. Society for Industrial and Applied Mathematics, Philadelphia (2004)
25.
Zurück zum Zitat Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefMATH Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefMATH
26.
Zurück zum Zitat Holub, J., Crochemore, M.: On the implementation of compact DAWG’s. In: Champarnaud, J.-M., Maurel, D. (eds.) Proceedings of the 7th International Conference on Implementation and Application of Automata, CIAA 2002, Revised Papers, Tours. Lecture Notes in Computer Science, vol. 2608, pp. 289–294. Springer, Berlin (2003) Holub, J., Crochemore, M.: On the implementation of compact DAWG’s. In: Champarnaud, J.-M., Maurel, D. (eds.) Proceedings of the 7th International Conference on Implementation and Application of Automata, CIAA 2002, Revised Papers, Tours. Lecture Notes in Computer Science, vol. 2608, pp. 289–294. Springer, Berlin (2003)
27.
Zurück zum Zitat Huang, S., Lam, T.W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Indexing similar DNA sequences. In: Chen, B. (ed.) Proceedings of the 6th International Conference on Algorithmic Aspects in Information and Management, AAIM 2010, Weihai. Lecture Notes in Computer Science, vol. 6124, pp. 180–190. Springer, Berlin (2010) Huang, S., Lam, T.W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Indexing similar DNA sequences. In: Chen, B. (ed.) Proceedings of the 6th International Conference on Algorithmic Aspects in Information and Management, AAIM 2010, Weihai. Lecture Notes in Computer Science, vol. 6124, pp. 180–190. Springer, Berlin (2010)
28.
Zurück zum Zitat Inenaga, S., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S., Mauri, G., Pavesi, G.: On-line construction of compact directed acyclic word graphs. In: Lewenstein, M., Valiente, G. (eds.) Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching, CPM 2006, Barcelona. Lecture Notes in Computer Science, vol. 4009, pp. 169–180. Springer, Berlin (2006) Inenaga, S., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S., Mauri, G., Pavesi, G.: On-line construction of compact directed acyclic word graphs. In: Lewenstein, M., Valiente, G. (eds.) Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching, CPM 2006, Barcelona. Lecture Notes in Computer Science, vol. 4009, pp. 169–180. Springer, Berlin (2006)
29.
Zurück zum Zitat Itoh, H., Tanaka, H.: An efficient method for in memory construction of suffix arrays. In: Proceedings of String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware, pp. 81–88 (1999) Itoh, H., Tanaka, H.: An efficient method for in memory construction of suffix arrays. In: Proceedings of String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware, pp. 81–88 (1999)
30.
Zurück zum Zitat Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) Proceedings of the 30th International Colloquium on Automata, Languages and Programming, ICALP 2003, Eindhoven. Lecture Notes in Computer Science, vol. 2719, pp. 943–955. Springer, Berlin (2003) Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) Proceedings of the 30th International Colloquium on Automata, Languages and Programming, ICALP 2003, Eindhoven. Lecture Notes in Computer Science, vol. 2719, pp. 943–955. Springer, Berlin (2003)
31.
Zurück zum Zitat Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd South American Workshop on String Processing (WSP). Citeseer (1996) Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd South American Workshop on String Processing (WSP). Citeseer (1996)
32.
Zurück zum Zitat Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, CPM 2003, Morelia, Michocán. Lecture Notes in Computer Science, vol. 2676, pp. 186–199. Springer, Berlin (2003) Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, CPM 2003, Morelia, Michocán. Lecture Notes in Computer Science, vol. 2676, pp. 186–199. Springer, Berlin (2003)
33.
Zurück zum Zitat Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, CPM 2003, Morelia, Michocán. Lecture Notes in Computer Science, vol. 2676, pp. 200–210. Springer, Berlin (2003) Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, CPM 2003, Morelia, Michocán. Lecture Notes in Computer Science, vol. 2676, pp. 200–210. Springer, Berlin (2003)
34.
Zurück zum Zitat Kurtz, S.: Reducing the space requirement of suffix trees. Softw.-Pract. Exper. 29(13), 1149–1171 (1999)CrossRef Kurtz, S.: Reducing the space requirement of suffix trees. Softw.-Pract. Exper. 29(13), 1149–1171 (1999)CrossRef
35.
Zurück zum Zitat Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chávez, E., Lonardi, S. (eds.) Proceedings of the 17th International Symposium on String Processing and Information Retrieval, SPIRE 2010, Los Cabos. Lecture Notes in Computer Science, vol. 6393, pp. 201–206. Springer, Berlin (2010) Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chávez, E., Lonardi, S. (eds.) Proceedings of the 17th International Symposium on String Processing and Information Retrieval, SPIRE 2010, Los Cabos. Lecture Notes in Computer Science, vol. 6393, pp. 201–206. Springer, Berlin (2010)
38.
Zurück zum Zitat Mäkinen, V.: Compact suffix array-a space-efficient full-text index. Fundam. Inform. 56(1–2), 191–210 (2003)MathSciNetMATH Mäkinen, V.: Compact suffix array-a space-efficient full-text index. Fundam. Inform. 56(1–2), 191–210 (2003)MathSciNetMATH
39.
Zurück zum Zitat Mäkinen, V., Navarro, G.: Compressed compact suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusöz, U. (eds.) Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching, CPM 2004, Istanbul. Lecture Notes in Computer Science, vol. 3109, pp. 420–433. Springer, Berlin (2004) Mäkinen, V., Navarro, G.: Compressed compact suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusöz, U. (eds.) Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching, CPM 2004, Istanbul. Lecture Notes in Computer Science, vol. 3109, pp. 420–433. Springer, Berlin (2004)
40.
Zurück zum Zitat Mäkinen, V., Navarro, G.: New search algorithms and time/space tradeoffs for succinct suffix arrays. Technical Report C-2004-20, University of Helsinki (2004) Mäkinen, V., Navarro, G.: New search algorithms and time/space tradeoffs for succinct suffix arrays. Technical Report C-2004-20, University of Helsinki (2004)
41.
Zurück zum Zitat Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12(1), 40–66 (2005)MathSciNetMATH Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12(1), 40–66 (2005)MathSciNetMATH
42.
43.
Zurück zum Zitat Maniscalco, M.A., Puglisi, S.J.: Faster lightweight suffix array construction. In: Proceedings of the 17th Australasian Workshop on Combinatorial Algorithms, Ayers Rock, Uluru, pp. 16–29 (2006) Maniscalco, M.A., Puglisi, S.J.: Faster lightweight suffix array construction. In: Proceedings of the 17th Australasian Workshop on Combinatorial Algorithms, Ayers Rock, Uluru, pp. 16–29 (2006)
44.
Zurück zum Zitat Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40(1), 33–50 (2004)MathSciNetCrossRefMATH Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40(1), 33–50 (2004)MathSciNetCrossRefMATH
46.
Zurück zum Zitat Morrison, D.: Patricia-practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)CrossRef Morrison, D.: Patricia-practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)CrossRef
47.
Zurück zum Zitat Na, J.C., Park, H., Crochemore, M., Holub, J., Iliopoulos, C.S., Mouchard, L., Park, K.: Suffix tree of alignment: an efficient index for similar data. In: Lecroq, T., Mouchard, L. (eds.) Proceedings of the 24th International Workshop on Combinatorial Algorithms, IWOCA 2013, Rouen. Lecture Notes in Computer Science, vol. 8288. Springer, Berlin (2013) Na, J.C., Park, H., Crochemore, M., Holub, J., Iliopoulos, C.S., Mouchard, L., Park, K.: Suffix tree of alignment: an efficient index for similar data. In: Lecroq, T., Mouchard, L. (eds.) Proceedings of the 24th International Workshop on Combinatorial Algorithms, IWOCA 2013, Rouen. Lecture Notes in Computer Science, vol. 8288. Springer, Berlin (2013)
48.
Zurück zum Zitat Na, J.C., Park, H., Lee, S., Hong, M., Lecroq, T., Mouchard, L., Park, K.: Suffix array of alignment: a practical index for similar data. In: Oren Kurland, M.L., Porat, E. (eds.) Proceedings of the 20th International Symposium on String Processing and Information Retrieval, SPIRE 2013, Jerusalem. Lecture Notes in Computer Science, vol. 8214, pp. 243–254. Springer, Berlin (2013) Na, J.C., Park, H., Lee, S., Hong, M., Lecroq, T., Mouchard, L., Park, K.: Suffix array of alignment: a practical index for similar data. In: Oren Kurland, M.L., Porat, E. (eds.) Proceedings of the 20th International Symposium on String Processing and Information Retrieval, SPIRE 2013, Jerusalem. Lecture Notes in Computer Science, vol. 8214, pp. 243–254. Springer, Berlin (2013)
49.
Zurück zum Zitat Navarro, G.: Indexing text using the Ziv-Lempel trie. In: Laender, A.H.F., Oliveira, A.L. (eds.) Proceedings of the 9th International Symposium on String Processing and Information Retrieval, SPIRE 2002, Lisbon. Lecture Notes in Computer Science, vol. 2476, pp. 325–336. Springer, Berlin (2002) Navarro, G.: Indexing text using the Ziv-Lempel trie. In: Laender, A.H.F., Oliveira, A.L. (eds.) Proceedings of the 9th International Symposium on String Processing and Information Retrieval, SPIRE 2002, Lisbon. Lecture Notes in Computer Science, vol. 2476, pp. 325–336. Springer, Berlin (2002)
50.
Zurück zum Zitat Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1), 2 (2007)CrossRefMATH Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1), 2 (2007)CrossRefMATH
51.
Zurück zum Zitat Nekrich, Y.: Orthogonal range searching in linear and almost-linear space. In: Dehne, F.K.H.A., Sack, J.-R., Zeh, N. (eds.) Proceedings of the 10th International Workshop on Algorithms and Data Structures, WADS 2007, Halifax. Lecture Notes in Computer Science, vol. 4619, pp. 15–26. Springer, Berlin (2007) Nekrich, Y.: Orthogonal range searching in linear and almost-linear space. In: Dehne, F.K.H.A., Sack, J.-R., Zeh, N. (eds.) Proceedings of the 10th International Workshop on Algorithms and Data Structures, WADS 2007, Halifax. Lecture Notes in Computer Science, vol. 4619, pp. 15–26. Springer, Berlin (2007)
52.
Zurück zum Zitat Procházka, P., Holub, J.: Compressing similar biological sequences using FM-index. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2014, Snowbird, UT, 26–28 March 2014, pp. 312–321. IEEE, New York (2014) Procházka, P., Holub, J.: Compressing similar biological sequences using FM-index. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) Data Compression Conference, DCC 2014, Snowbird, UT, 26–28 March 2014, pp. 312–321. IEEE, New York (2014)
53.
Zurück zum Zitat Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 4 (2007)CrossRef Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 4 (2007)CrossRef
54.
Zurück zum Zitat Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1), 211–222 (2003)MathSciNetCrossRefMATH Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1), 211–222 (2003)MathSciNetCrossRefMATH
55.
56.
Zurück zum Zitat Schürmann, K.-B., Stoye, J.: An incomplex algorithm for fast suffix array construction. In: Demetrescu, C., Sedgewick, R., Tamassia, R. (eds.) Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithmics and Combinatorics, ALENEX/ANALCO 2005, Vancouver, BC, pp. 77–85. SIAM, Philadelphia (2005) Schürmann, K.-B., Stoye, J.: An incomplex algorithm for fast suffix array construction. In: Demetrescu, C., Sedgewick, R., Tamassia, R. (eds.) Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithmics and Combinatorics, ALENEX/ANALCO 2005, Vancouver, BC, pp. 77–85. SIAM, Philadelphia (2005)
57.
Zurück zum Zitat Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Amir, A., Turpin, A., Moffat, A. (eds.) Proceedings of the 15th International Symposium on String Processing and Information Retrieval, SPIRE 2008, Melbourne. Lecture Notes in Computer Science, vol. 5280, pp. 164–175. Springer, Berlin (2008) Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Amir, A., Turpin, A., Moffat, A. (eds.) Proceedings of the 15th International Symposium on String Processing and Information Retrieval, SPIRE 2008, Melbourne. Lecture Notes in Computer Science, vol. 5280, pp. 164–175. Springer, Berlin (2008)
60.
Zurück zum Zitat Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory, SWAT (FOCS), Iowa City, IA, vol. 1873, pp. 1–11. IEEE Computer Society, Washington (1973) Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory, SWAT (FOCS), Iowa City, IA, vol. 1873, pp. 1–11. IEEE Computer Society, Washington (1973)
Metadaten
Titel
Algorithms for Indexing Highly Similar DNA Sequences
verfasst von
Nadia Ben Nsira
Thierry Lecroq
Mourad Elloumi
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-59826-0_1