MPSAGA: a matrix-based pair-wise sequence alignment algorithm for global alignment with position based sequence representation

Lakhani, Jyoti; Khunteta, Ajay; Choudhary, Anupama; Harwani, Dharmesh

doi:10.1007/s12046-019-1141-x

MPSAGA: a matrix-based pair-wise sequence alignment algorithm for global alignment with position based sequence representation

Published: 29 June 2019

Volume 44, article number 171, (2019)
Cite this article

Sādhanā Aims and scope Submit manuscript

Jyoti Lakhani^1,2,
Ajay Khunteta²,
Anupama Choudhary³ &
…
Dharmesh Harwani ORCID: orcid.org/0000-0003-3296-5942⁴

180 Accesses
1 Citation
Explore all metrics

Abstract

The proposed algorithm is a novel matrix-based global pair-wise sequence alignment with a de novo sequence representation. Needleman–Wunsch, noblest, Emboss-Needle, ALIGN, LALIGN, FOGSAA, DIALIGN, ACANA, MUMmer, etc. are few other algorithms that are most commonly used for global pair-wise sequence alignment. Needleman–Wunsch algorithm is one of the most popular algorithms that provides the best possible pair-wise sequence alignment, but the algorithm output is associated with high time and space complexities. To resolve these complex issues, researchers have proposed several algorithms to reduce time and space complexities in the pair-wise sequence alignment. Most of these algorithms provide solutions, but compromise the optimal result in favor of plummeting time and space complexities. An attempt has been made in the present research to develop MPSAGA and a completely unique positional matrix (PM) based sequence representation to deal with the time and space complexities without compromising sequence alignment results (MPSAGA is in public domain available at https://github.com/JyotiLakhani1/MPSAGA). A benchmarking of the proposed algorithm has also been performed with other popular pair-wise sequence alignment algorithms with and without positional matrix-based sequence representation. The use of an integer instead of string data type and exclusive clustering method in MPSAGA with positional matrix based sequence representation resulted in a noteworthy reduction in the memory usage (space) and execution time in the pair-wise alignment of biological sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach

Article Open access 07 August 2014

Pairwise alignment of nucleotide sequences using maximal exact matches

Article Open access 21 May 2019

Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment

References

Baker W, Broek A, Camon E, Hingamp P, Strek P, Stoesser G and Tuli M A 2000 The EMBL Nucleotide Sequence Database. Nucleic Acid Res. 28(1): 19–23
Article Google Scholar
Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A, Bates K, Bhattacharyya S, Browne P, Van denBroek A et al 2006 EMBL Nucleotide Sequence Database: developments in 2005. Nucleic Acids Res. 34:10–15
Article Google Scholar
Benson D A, Karsch-Mizrachi I, Lipman D J, Ostell J and Sayers E W 2011 Genbank. Nucleic Acids Res. D39:32–37
Article Google Scholar
Benson D A, Karsch-Mizrachi I, Lipman D J, Ostell J and Wheeler D L 2007 GenBank. Nucleic Acids Res. 35:D21–D25
Article Google Scholar
Okubo K, Sugawara H, Gojobori T and Tateno Y 2006 DDBJ in preparation for overview of research activities behind data submissions. Nucleic Acids Res. 34:6–9
Article Google Scholar
Tateno Y, Imanishi T, Miyazaki S, Fukami-Kobayashi K, Saitou N, Sugawara H and Gojobori T 2002 DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res. 30(1):27–30
Article Google Scholar
Schuler G D, Epstein J A, Ohkawa H and Kans J A 1966 Entrez: molecular biology database and retrieval system. Methods Enzymol. 266:141–162
Article Google Scholar
Pearson W R and Lipman D J 1988 Improved Tools for Biological Sequence Comparison. Proceedings of the National Academy of Sciences of the United States of America, pp.2444–2448
Article Google Scholar
Altschul S F, Gish W, Miller W, Myers E W and Lipman D J 1990 Basic local alignment search tool. J. Mol. Biol. 215 (3): 403–410
Article Google Scholar
Kumar S, Tamura K and Nei M 1994 MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers. Comput. Appl. Biosci. 10(2):189–191
Google Scholar
Kumar S, Stecher G and Tamura K 2016 MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33(7):1870–1874
Article Google Scholar
Lakhani J, Khunteta A, Chowdhary A and Harwani D 2016 Auto-Evolving Clusters based on Rejection and Migration. In: Proceedings of the International Conference on Advances in Information Communication Technology & Computing (AICTC ‘16) Bishnoi S K, Kuri M, Goar V (Eds.). ACM, New York, NY, USA,, Article 98, 6 pages
Smith T F and Waterman M S 1981 Identification of common molecular sub sequences. J. Mol. Biol. 147:195–197
Article Google Scholar
Needleman B and Wunsch D 1970 A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3): 443–53
Article Google Scholar
Chakraborty A and Bandyopadhyay S 2013 FOGSAA: fast optimal global sequence alignment algorithm. Sci. Rep. 3:1746
Article Google Scholar
Morgenstern B 2004 DIALIGN: Multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res. 32:W33–W36
Article Google Scholar
Kurtz S, Phillippy A, Delcher A L, Smoot M, Shumway M, Antonescu C and Salzberg S L 2004 Open source MUMmer 3.0, Versatile and open software for comparing large genomes. Genome Biol. 5: R12
Delcher A L, Phillippy A, Carlton J and Salzberg S L 2002 MUMmer 2.1, NUCmer, Fast algorithms for large-scale genome alignment and comparision. Nucleic Acids Res. 30(11): 2478–2483
Delcher A L, Kasif S, Fleischmann R D, Peterson J, White O and Salzberg S L 1999 MUMmer 1.0, Alignment of whole genomes. Nucleic Acids Res. 27(11): 2369–2376
Article Google Scholar
Weichun H, David M, Umbach and Leping Li 2006 Accurate anchoring alignment of divergent sequences. Bioinformatics. 22:29–34
Article Google Scholar
Wilkinson L 1999 Dot plots. The American Statistician. 53:276–281
Google Scholar
Stamm M, Staritzbichler R, Khafizov K and Forrest L R 2014 AlignMe-a membrane protein sequence alignment web server. Nucleic Acids Res.42(W1):W246–251
Article Google Scholar
Stamm M, Staritzbichler R, Khafizov K and Forrest L R 2013 Alignment of Helical Membrane Protein Sequences Using AlignMe. PLoS ONE. 8(3):e57731
Article Google Scholar
Khafizov K, Staritzbichler R, Stamm M and Forrest L R 2010 A study of the evolution of inverted-topology repeats from LeuT-fold transporters using AlignMe. Biochemistry 49(50):10702–10713
Article Google Scholar
Gentleman R, Carey V, Huber W, Irizarry R and Dudoit S 2005 Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer
Gentleman R 2008 Programming for Bioinformatics. Chapman and Hall, CRC Press, Boca Raton
Book Google Scholar
Hahne F, Huber W, Gentleman R and Falcon S 2008 Bioconductor Case Studies. Springer
Vogt N 2016 NBLAST: a similarity search for neurons. Nature Methods 13:717
Article Google Scholar
Dumontier M and Hogue C W V 2002 NBLAST: a cluster variant of BLAST for NxN comparisons. BMC Bioinfo. 3:13
Article Google Scholar
Smith K 2014 A brief history of NCBI’s formation and growth. The NCBI Handbook. 2nd edition [Internet] ix-xiv

Download references

Author information

Authors and Affiliations

Department of Computer Science, Maharaja Ganga Singh University, Bikaner, 334 004, India
Jyoti Lakhani
Department of Computer Engineering, Poornima University, Jaipur, India
Jyoti Lakhani & Ajay Khunteta
Department of Computer Science, Keen College, Bikaner, India
Anupama Choudhary
Department of Microbiology, Maharaja Ganga Singh University, Bikaner, 334 004, India
Dharmesh Harwani

Authors

Jyoti Lakhani
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Khunteta
View author publications
You can also search for this author in PubMed Google Scholar
Anupama Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Dharmesh Harwani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dharmesh Harwani.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 460 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lakhani, J., Khunteta, A., Choudhary, A. et al. MPSAGA: a matrix-based pair-wise sequence alignment algorithm for global alignment with position based sequence representation. Sādhanā 44, 171 (2019). https://doi.org/10.1007/s12046-019-1141-x

Download citation

Received: 09 May 2018
Revised: 07 April 2019
Accepted: 16 April 2019
Published: 29 June 2019
DOI: https://doi.org/10.1007/s12046-019-1141-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MPSAGA: a matrix-based pair-wise sequence alignment algorithm for global alignment with position based sequence representation

Abstract

Access this article

Similar content being viewed by others

ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach

Pairwise alignment of nucleotide sequences using maximal exact matches

Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 460 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MPSAGA: a matrix-based pair-wise sequence alignment algorithm for global alignment with position based sequence representation

Abstract

Access this article

Similar content being viewed by others

ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach

Pairwise alignment of nucleotide sequences using maximal exact matches

Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 460 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation