Skip to main content

2023 | OriginalPaper | Buchkapitel

CSA-MEM: Enhancing Circular DNA Multiple Alignment Through Text Indexing Algorithms

verfasst von : André Salgado, Francisco Fernandes, Ana Teresa Freitas

Erschienen in: Bioinformatics Research and Applications

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the realm of Bioinformatics, the comparison of DNA sequences is essential for tasks such as phylogenetic identification, comparative genomics, and genome reconstruction. Methods for estimating sequence similarity have been successfully applied in this field. The application of these methods to circular genomic structures, common in nature, poses additional computational hurdles. In the advancing field of metagenomics, innovative circular DNA alignment algorithms are vital for accurately understanding circular genome complexities. Aligning circular DNA, more intricate than linear sequences, demands heightened algorithms due to circularity, escalating computation requirements and runtime. This paper proposes CSA-MEM, an efficient text indexing algorithm to identify the most informative region to rotate and cut circular genomes, thus improving alignment accuracy. The algorithm uses a circular variation of the FM-Index and identifies the longest chain of non-repeated maximal subsequences common to a set of circular genomes, enabling the most adequate rotation and linearisation for multiple alignment. The effectiveness of the approach was validated in five sets of mitochondrial, viral and bacterial DNA. The results show that CSA-MEM significantly improves the efficiency of multiple sequence alignment, consistently achieving top scores compared to other state-of-the-art methods. This tool enables more realistic phylogenetic comparisons between species, facilitates large metagenomic data processing, and opens up new possibilities in comparative genomics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ayad, L.A., Pissis, S.P.: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 18(1), 1–10 (2017)CrossRef Ayad, L.A., Pissis, S.P.: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 18(1), 1–10 (2017)CrossRef
3.
Zurück zum Zitat Barton, C., Iliopoulos, C.S., Pissis, S.P.: Fast algorithms for approximate circular string matching. Algorithms Mol. Biol. 9, 1–10 (2014)CrossRef Barton, C., Iliopoulos, C.S., Pissis, S.P.: Fast algorithms for approximate circular string matching. Algorithms Mol. Biol. 9, 1–10 (2014)CrossRef
4.
Zurück zum Zitat Burrows, M.: A block-sorting lossless data compression algorithm. SRS Res. Rep. 124 (1994) Burrows, M.: A block-sorting lossless data compression algorithm. SRS Res. Rep. 124 (1994)
5.
Zurück zum Zitat Carattoli, A.: Plasmids and the spread of resistance. Int. J. Med. Microbiol. 303(6), 298–304 (2013)CrossRefPubMed Carattoli, A.: Plasmids and the spread of resistance. Int. J. Med. Microbiol. 303(6), 298–304 (2013)CrossRefPubMed
6.
Zurück zum Zitat Dulanto, C.A., Dekker, J.P.: From the pipeline to the bedside: advances and challenges in clinical metagenomics. J. Infect. Dis. 221(Supplement 3), S331–S340 (2019) Dulanto, C.A., Dekker, J.P.: From the pipeline to the bedside: advances and challenges in clinical metagenomics. J. Infect. Dis. 221(Supplement 3), S331–S340 (2019)
7.
Zurück zum Zitat Fehér, E., Mihalov-Kovács, E., Kaszab, E., Malik, Y.S., Marton, S., Bányai, K.: Genomic diversity of CRESS DNA viruses in the eukaryotic Virome of swine feces. Microorganisms 9(7), 1426 (2021)CrossRefPubMedPubMedCentral Fehér, E., Mihalov-Kovács, E., Kaszab, E., Malik, Y.S., Marton, S., Bányai, K.: Genomic diversity of CRESS DNA viruses in the eukaryotic Virome of swine feces. Microorganisms 9(7), 1426 (2021)CrossRefPubMedPubMedCentral
8.
Zurück zum Zitat Fernandes, F., Freitas, A.T.: slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array. Bioinformatics 30(4), 464–471 (2014)CrossRefPubMed Fernandes, F., Freitas, A.T.: slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array. Bioinformatics 30(4), 464–471 (2014)CrossRefPubMed
9.
Zurück zum Zitat Fernandes, F., Pereira, L., Freitas, A.T.: CSA: an efficient algorithm to improve circular DNA multiple alignment. BMC Bioinformatics 10(1), 1–13 (2009)CrossRef Fernandes, F., Pereira, L., Freitas, A.T.: CSA: an efficient algorithm to improve circular DNA multiple alignment. BMC Bioinformatics 10(1), 1–13 (2009)CrossRef
10.
Zurück zum Zitat Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000) Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)
11.
Zurück zum Zitat Grossi, R., Iliopoulos, C.S., Mercas, R., et al.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11(12) (2016) Grossi, R., Iliopoulos, C.S., Mercas, R., et al.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11(12) (2016)
12.
Zurück zum Zitat Gusfield, D.: An “increment-by-one” approach to suffix arrays and trees. Report. CSE-90-39, Computer Science Division, University of California, Davis (1990) Gusfield, D.: An “increment-by-one” approach to suffix arrays and trees. Report. CSE-90-39, Computer Science Division, University of California, Davis (1990)
13.
Zurück zum Zitat Laudadio, I., Fulc, V., Stronati, L., Carissimi, C.: Next-generation metagenomics: methodological challenges and opportunities. OMICS 23(7), 327–333 (2019)CrossRefPubMed Laudadio, I., Fulc, V., Stronati, L., Carissimi, C.: Next-generation metagenomics: methodological challenges and opportunities. OMICS 23(7), 327–333 (2019)CrossRefPubMed
14.
Zurück zum Zitat Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993) Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
15.
Zurück zum Zitat Mosig, A., Hofacker, I.L., Stadler, P.F.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. In: Lecture Notes in Informatics. Proceedings German Conference on Bioinformatics (2006) Mosig, A., Hofacker, I.L., Stadler, P.F.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. In: Lecture Notes in Informatics. Proceedings German Conference on Bioinformatics (2006)
16.
Zurück zum Zitat Pan, S., Zhao, X.M., Coelho, L.P.: SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing. Bioinformatics 39(Supplement 1), i21–i29 (2023)CrossRefPubMedPubMedCentral Pan, S., Zhao, X.M., Coelho, L.P.: SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing. Bioinformatics 39(Supplement 1), i21–i29 (2023)CrossRefPubMedPubMedCentral
17.
Zurück zum Zitat Pereira, L., et al.: The diversity present in 5140 human mitochondrial genomes. Am. J. Hum. Genetics 84(5), 628–640 (2009)CrossRef Pereira, L., et al.: The diversity present in 5140 human mitochondrial genomes. Am. J. Hum. Genetics 84(5), 628–640 (2009)CrossRef
18.
Zurück zum Zitat Pohjoismäki, J.L.O., Goffart, S.: Of circles, forks and humanity: topological organisation and replication of mammalian mitochondrial DNA. BioEssays 33(4), 290–299 (2011)CrossRefPubMed Pohjoismäki, J.L.O., Goffart, S.: Of circles, forks and humanity: topological organisation and replication of mammalian mitochondrial DNA. BioEssays 33(4), 290–299 (2011)CrossRefPubMed
19.
Zurück zum Zitat Thompson, J.D., Gibson, T.J., Higgins, D.G.: Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics 1, 2–3 (2003) Thompson, J.D., Gibson, T.J., Higgins, D.G.: Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics 1, 2–3 (2003)
20.
Zurück zum Zitat Tisza, M.J., et al.: Discovery of several thousand highly diverse circular DNA viruses. Elife 9 (2020) Tisza, M.J., et al.: Discovery of several thousand highly diverse circular DNA viruses. Elife 9 (2020)
21.
22.
Zurück zum Zitat Zhang, Y., Zhang, Q., Zhou, J., Zou, Q.: A survey on the algorithm and development of multiple sequence alignment. Briefings Bioinformatics 23(3) (2022) Zhang, Y., Zhang, Q., Zhou, J., Zou, Q.: A survey on the algorithm and development of multiple sequence alignment. Briefings Bioinformatics 23(3) (2022)
23.
Zurück zum Zitat Zhao, L., Rosario, K., Breitbart, M., Duffy, S.: Chapter three - eukaryotic circular rep-encoding single-stranded DNA (cress DNA) viruses: ubiquitous viruses with small genomes and a diverse host range. In: Advances in Virus Research, vol. 103, pp. 71–133 (2019) Zhao, L., Rosario, K., Breitbart, M., Duffy, S.: Chapter three - eukaryotic circular rep-encoding single-stranded DNA (cress DNA) viruses: ubiquitous viruses with small genomes and a diverse host range. In: Advances in Virus Research, vol. 103, pp. 71–133 (2019)
Metadaten
Titel
CSA-MEM: Enhancing Circular DNA Multiple Alignment Through Text Indexing Algorithms
verfasst von
André Salgado
Francisco Fernandes
Ana Teresa Freitas
Copyright-Jahr
2023
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-7074-2_41

Premium Partner