Skip to main content
Erschienen in: Soft Computing 20/2020

12.04.2020 | Methodologies and Application

A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm

verfasst von: Biswanath Chowdhury, Gautam Garai

Erschienen in: Soft Computing | Ausgabe 20/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multiple sequence alignment (MSA) is characterized as a very high computational complex problem. Therefore, MSA problem cannot be solved by exhaustive methods. Nowadays, MSA is being solved by optimizing more than one objective simultaneously. In this paper, we propose a new genetic algorithm based alignment technique, named bi-objective sequence alignment using genetic algorithm (BSAGA). The novelty of this approach is its selection process. One part of the population is selected based on the Sum of Pair, and rest is selected based on Total Conserve Columns. We applied integer-based chromosomal coding to represent only the gap positions in an alignment. Such representation improves the search technique to reach an optimum even for longer sequences. We tested and compared the alignment score of BSAGA with other relevant alignment techniques on BAliBASE and SABmark. The BSAGA shows better performance than others do, which was further proved by the Wilcoxon sign test.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucl Acids Res 28:235–242CrossRef Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucl Acids Res 28:235–242CrossRef
Zurück zum Zitat Chuong BD, Kazutaka K (2008) Protein multiple sequence alignment. Methods Mol Biol 484:379–413CrossRef Chuong BD, Kazutaka K (2008) Protein multiple sequence alignment. Methods Mol Biol 484:379–413CrossRef
Zurück zum Zitat Corder GW (2009) Foreman DI: nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New YorkMATHCrossRef Corder GW (2009) Foreman DI: nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New YorkMATHCrossRef
Zurück zum Zitat Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: Nsga-II. IEEE Trans Evol Comput 6:182–197CrossRef Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: Nsga-II. IEEE Trans Evol Comput 6:182–197CrossRef
Zurück zum Zitat DeRonne KW, Karypis G (2013) Pareto optimal pairwise sequence alignment. IEEE/ACM Trans Comput Biol Bioinform 10:481–493CrossRef DeRonne KW, Karypis G (2013) Pareto optimal pairwise sequence alignment. IEEE/ACM Trans Comput Biol Bioinform 10:481–493CrossRef
Zurück zum Zitat Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340CrossRef Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340CrossRef
Zurück zum Zitat Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32:1792–1797CrossRef Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32:1792–1797CrossRef
Zurück zum Zitat Ehrgott M (2005) Multicriteria optimization. Springer, BerlinMATH Ehrgott M (2005) Multicriteria optimization. Springer, BerlinMATH
Zurück zum Zitat Eusuff M, Lansey K, Pasha F (2006) Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng Optim 38:129–154MathSciNetCrossRef Eusuff M, Lansey K, Pasha F (2006) Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng Optim 38:129–154MathSciNetCrossRef
Zurück zum Zitat Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360CrossRef Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360CrossRef
Zurück zum Zitat Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, BostonMATH Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, BostonMATH
Zurück zum Zitat Gondro C, Kinghorn BP (2007) A simple genetic algorithm for multiple sequence alignment. Genet Mol Res 6:964–982 Gondro C, Kinghorn BP (2007) A simple genetic algorithm for multiple sequence alignment. Genet Mol Res 6:964–982
Zurück zum Zitat Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838CrossRef Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838CrossRef
Zurück zum Zitat Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89:10915–10919CrossRef Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89:10915–10919CrossRef
Zurück zum Zitat Heringa J, Taylor WR (1997) Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol 7:416–421CrossRef Heringa J, Taylor WR (1997) Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol 7:416–421CrossRef
Zurück zum Zitat Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J Mol Evol 20:175–186CrossRef Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phylogenetic trees: an integrated method. J Mol Evol 20:175–186CrossRef
Zurück zum Zitat Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Zurück zum Zitat Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06. Erciyes University, Engineering Faculty, Computer Engineering Department Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06. Erciyes University, Engineering Faculty, Computer Engineering Department
Zurück zum Zitat Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 30:3059–3066CrossRef Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 30:3059–3066CrossRef
Zurück zum Zitat Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res 33:511–518CrossRef Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res 33:511–518CrossRef
Zurück zum Zitat Kaya M, Sarhan A, Alhajj R (2014) Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. Comput Methods Prog Biomed 114:38–49CrossRef Kaya M, Sarhan A, Alhajj R (2014) Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. Comput Methods Prog Biomed 114:38–49CrossRef
Zurück zum Zitat Kemena C, Taly JF, Kleinjung J, Notredame C (2011) STRIKE: evaluation of protein MSAs using a single 3D structure. Bioinformatics 27:3385–3391CrossRef Kemena C, Taly JF, Kleinjung J, Notredame C (2011) STRIKE: evaluation of protein MSAs using a single 3D structure. Bioinformatics 27:3385–3391CrossRef
Zurück zum Zitat Lam AY, Li VO (2010) Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans Evol Comput 14:381–399CrossRef Lam AY, Li VO (2010) Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans Evol Comput 14:381–399CrossRef
Zurück zum Zitat Lassmann T, Frings O, Sonnhammer ELL (2009) Kalign2: high performance multiple alignment of protein and nucleotide sequences allowing external features. Nucl Acids Res 37:858–865CrossRef Lassmann T, Frings O, Sonnhammer ELL (2009) Kalign2: high performance multiple alignment of protein and nucleotide sequences allowing external features. Nucl Acids Res 37:858–865CrossRef
Zurück zum Zitat Lee ZH, Su SF, Chuang CC, Liu KH (2008) Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 8:55–78CrossRef Lee ZH, Su SF, Chuang CC, Liu KH (2008) Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 8:55–78CrossRef
Zurück zum Zitat Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci 86:4412–4415CrossRef Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci 86:4412–4415CrossRef
Zurück zum Zitat Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26:1958–1964CrossRef Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26:1958–1964CrossRef
Zurück zum Zitat Loytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci 102:10557–10562CrossRef Loytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci 102:10557–10562CrossRef
Zurück zum Zitat Miller BL, Golberg DE (1995) Genetic algorithms, tournament selection, and the effects of noise. Complex Syst 9:193–212MathSciNet Miller BL, Golberg DE (1995) Genetic algorithms, tournament selection, and the effects of noise. Complex Syst 9:193–212MathSciNet
Zurück zum Zitat Mount DW (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor Mount DW (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor
Zurück zum Zitat Naznin F, Sarker R, Essam D (2011) Vertical decomposition with genetic algorithm for multiple sequence alignment. BMC Bioinform 12:353CrossRef Naznin F, Sarker R, Essam D (2011) Vertical decomposition with genetic algorithm for multiple sequence alignment. BMC Bioinform 12:353CrossRef
Zurück zum Zitat Naznin F, Sarker R, Essam D (2012) Progressive alignment method using genetic algorithm for multiple sequence alignment. IEEE Trans Evol Comput 16:615–631CrossRef Naznin F, Sarker R, Essam D (2012) Progressive alignment method using genetic algorithm for multiple sequence alignment. IEEE Trans Evol Comput 16:615–631CrossRef
Zurück zum Zitat Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRef Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRef
Zurück zum Zitat Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3:131–144CrossRef Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3:131–144CrossRef
Zurück zum Zitat Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucl Acids Res 24:1515–1524CrossRef Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucl Acids Res 24:1515–1524CrossRef
Zurück zum Zitat Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217CrossRef Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217CrossRef
Zurück zum Zitat Ortuño FM, Valenzuela O, Rojas F, Pomares H, Florido JP, Urquiza JM, Rojas I (2013) Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics 29:2112–2121CrossRef Ortuño FM, Valenzuela O, Rojas F, Pomares H, Florido JP, Urquiza JM, Rojas I (2013) Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics 29:2112–2121CrossRef
Zurück zum Zitat Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucl Acids Res 34:4364–4374CrossRef Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucl Acids Res 34:4364–4374CrossRef
Zurück zum Zitat Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23:802–808CrossRef Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23:802–808CrossRef
Zurück zum Zitat Rahman RA, Ramli R, Jamari Z, Ku-Mahamud KR (2016) Evolutionary algorithm with roulette-tournament selection for solving aquaculture diet formulation. Math Probl Eng 2016:1–10 Rahman RA, Ramli R, Jamari Z, Ku-Mahamud KR (2016) Evolutionary algorithm with roulette-tournament selection for solving aquaculture diet formulation. Math Probl Eng 2016:1–10
Zurück zum Zitat Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721CrossRef Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721CrossRef
Zurück zum Zitat Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016a) Hybrid multiobjective artificial bee colony for multiple sequence alignment. Appl Soft Comput 41:157–168CrossRef Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016a) Hybrid multiobjective artificial bee colony for multiple sequence alignment. Appl Soft Comput 41:157–168CrossRef
Zurück zum Zitat Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016b) A hybrid multiobjective memetic metaheuristic for multiple sequence alignment. IEEE Trans Evol Comput 20:499–514CrossRef Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL (2016b) A hybrid multiobjective memetic metaheuristic for multiple sequence alignment. IEEE Trans Evol Comput 20:499–514CrossRef
Zurück zum Zitat Sean RE (2002) A memory-efficient dynamic programming algorithm for optimal alignment of sequence to an RNA secondary structure. BMC Bioinform 3:13CrossRef Sean RE (2002) A memory-efficient dynamic programming algorithm for optimal alignment of sequence to an RNA secondary structure. BMC Bioinform 3:13CrossRef
Zurück zum Zitat Shyu C, Sheneman L, Foster JA (2004) Multiple sequence alignment with evolutionary computation. Genet Progr Evolvable Mach 5:121–144CrossRef Shyu C, Sheneman L, Foster JA (2004) Multiple sequence alignment with evolutionary computation. Genet Progr Evolvable Mach 5:121–144CrossRef
Zurück zum Zitat Sievers F et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539CrossRef Sievers F et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539CrossRef
Zurück zum Zitat Smith TF, Waterman MS (1981) Identification of common molecular sequences. J Mol Biol 147:195–197CrossRef Smith TF, Waterman MS (1981) Identification of common molecular sequences. J Mol Biol 147:195–197CrossRef
Zurück zum Zitat Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:1–11CrossRef Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:1–11CrossRef
Zurück zum Zitat Taheri J, Zomaya AY (2009) RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem. BMC Genom 10(Suppl 1):S10CrossRef Taheri J, Zomaya AY (2009) RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem. BMC Genom 10(Suppl 1):S10CrossRef
Zurück zum Zitat Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucl Acids Res 22:4673–4680CrossRef Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucl Acids Res 22:4673–4680CrossRef
Zurück zum Zitat Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15:87–88CrossRef Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15:87–88CrossRef
Zurück zum Zitat Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136CrossRef Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61:127–136CrossRef
Zurück zum Zitat Thompson JD, Linard B, Lecompte D, Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS ONE 6:1–14CrossRef Thompson JD, Linard B, Lecompte D, Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS ONE 6:1–14CrossRef
Zurück zum Zitat Van Walle I, Lasters I, Wyns L (2005) SABmark—a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21:1267–1268CrossRef Van Walle I, Lasters I, Wyns L (2005) SABmark—a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21:1267–1268CrossRef
Zurück zum Zitat Wadud MS, Islam MR, Kundu N, Kabir MR (2018) Multiple sequence alignment using chemical reaction optimization algorithm. Int Conf Intell Syst Des Appl 941:1065–1074 Wadud MS, Islam MR, Kundu N, Kabir MR (2018) Multiple sequence alignment using chemical reaction optimization algorithm. Int Conf Intell Syst Des Appl 941:1065–1074
Zurück zum Zitat Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1:337–348CrossRef Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1:337–348CrossRef
Zurück zum Zitat Yamada S, Gotoh O, Yamana H (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinform 7:524CrossRef Yamada S, Gotoh O, Yamana H (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinform 7:524CrossRef
Zurück zum Zitat Zhou A, Qu BY, Li H, Zhao SZ, Suganthan PN, Zhang Q (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evol Comput 1:32–49CrossRef Zhou A, Qu BY, Li H, Zhao SZ, Suganthan PN, Zhang Q (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evol Comput 1:32–49CrossRef
Metadaten
Titel
A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm
verfasst von
Biswanath Chowdhury
Gautam Garai
Publikationsdatum
12.04.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 20/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-020-04917-5

Weitere Artikel der Ausgabe 20/2020

Soft Computing 20/2020 Zur Ausgabe