Skip to main content
Top
Published in: The Journal of Supercomputing 11/2020

19-02-2020

NestMSA: a new multiple sequence alignment algorithm

Authors: Mohammed Kayed, Ahmed A. Elngar

Published in: The Journal of Supercomputing | Issue 11/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multiple sequence alignment (MSA) is a core problem in many applications. Various optimization algorithms such as genetic algorithm and particle swarm optimization (PSO) have been used to solve this problem, where all of them are adapted to work in the bioinformatics domain. This paper defines the MSA problem, suggests a novel MSA algorithm called ‘NestMSA’ and evaluates it in two domains: Web data extraction and removing different URLs with similar text (DUST). The suggested algorithm is inspired by the PSO optimization algorithm. It is not a generalization of a two-sequence alignment algorithm as it processes all the sequences at the same time. Therefore, it looks globally at the same time on all sequences. Different from other PSO-based alignment algorithms, swarm particles in the proposed NestMSA algorithm are nested inside the sequences and communicated together to align them. Therefore, global maximum is guaranteed in our algorithm. Furthermore, this work suggests a new objective function which both maximizes the number of matched characters and minimizes the number of gaps inserted in the sequences. The running time complexity and the efficiency of NestMSA are addressed in this paper. The experiments show an encouraging result as it outperforms the two approaches DCA and TEX in the Web data extraction domain (95% and 96% of recall and precision, respectively). Furthermore, it gives a high-performance result in the DUST domain (95%, 93% and 92% of recall, precision and SPS score, respectively).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Mount DW (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring, Harbor, NY Mount DW (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring, Harbor, NY
2.
go back to reference Zhai Y, Liu B (2005) Web data extraction based on partial tree alignment. In: Proceeding of the International Conference of World Wide Web (WWW-14), pp 76–85 Zhai Y, Liu B (2005) Web data extraction based on partial tree alignment. In: Proceeding of the International Conference of World Wide Web (WWW-14), pp 76–85
3.
go back to reference Kayed M, Chang C-H (2010) GiVaTech: page-level web data extraction from template pages. IEEE Trans Knowl Data Eng 22(2):249–263CrossRef Kayed M, Chang C-H (2010) GiVaTech: page-level web data extraction from template pages. IEEE Trans Knowl Data Eng 22(2):249–263CrossRef
4.
go back to reference Rodrigues K, Cristo M, De Moura ES, Da Silva A (2015) Removing DUST using multiple alignment of sequences. IEEE Trans Knowl Data Eng 27(8):2261–2274CrossRef Rodrigues K, Cristo M, De Moura ES, Da Silva A (2015) Removing DUST using multiple alignment of sequences. IEEE Trans Knowl Data Eng 27(8):2261–2274CrossRef
5.
go back to reference Bar-Yossef Z, Keidar I (2009) Do not crawl in the DUST: different URLs with similar text. In: ACM Transactions on the Web (TWEB), vol 3. New York, USA, pp 1–31 Bar-Yossef Z, Keidar I (2009) Do not crawl in the DUST: different URLs with similar text. In: ACM Transactions on the Web (TWEB), vol 3. New York, USA, pp 1–31
6.
go back to reference Kayed M (2012) Peer matrix alignment: a new algorithm. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (ICDM), pp 268–279 Kayed M (2012) Peer matrix alignment: a new algorithm. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (ICDM), pp 268–279
7.
go back to reference Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131–144CrossRef Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131–144CrossRef
8.
go back to reference Rubio-Largo A, Vanneschi L, Castelli M, Vega-Rodrguez MA (2018) A characteristic-based framework for multiple sequence aligners. IEEE Trans Cybern 48(1):41–51CrossRef Rubio-Largo A, Vanneschi L, Castelli M, Vega-Rodrguez MA (2018) A characteristic-based framework for multiple sequence aligners. IEEE Trans Cybern 48(1):41–51CrossRef
9.
go back to reference Quoc-Nam T, Wallinga M (2017) UPS: a new approach for multiple sequence alignment using morphing techniques. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 425–430 Quoc-Nam T, Wallinga M (2017) UPS: a new approach for multiple sequence alignment using morphing techniques. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 425–430
10.
go back to reference Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25(24):4876–82CrossRef Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25(24):4876–82CrossRef
11.
go back to reference Notredame C, Higgins DG, Heringa J (2000) T-COFFEE: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217CrossRef Notredame C, Higgins DG, Heringa J (2000) T-COFFEE: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217CrossRef
12.
go back to reference Rodriguez PF, Nino LF, Alonso OM (2007) Multiple sequence alignment using swarm intelligence. Int J Comput Intell Res 3(2):123–130CrossRef Rodriguez PF, Nino LF, Alonso OM (2007) Multiple sequence alignment using swarm intelligence. Int J Comput Intell Res 3(2):123–130CrossRef
13.
go back to reference Xu F, Chen Y (2009) A method for multiple sequence alignment based on particle swarm optimization. In: 5th International Conference on Intelligent Computing, ICIC 2009, Ulsan, South Korea, pp 16–19 Xu F, Chen Y (2009) A method for multiple sequence alignment based on particle swarm optimization. In: 5th International Conference on Intelligent Computing, ICIC 2009, Ulsan, South Korea, pp 16–19
14.
go back to reference Zhan Q, Wang N, Jin S, Tan R, Jiang Q, Wang Y (2019) ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function. BMC Bioinf 20:573CrossRef Zhan Q, Wang N, Jin S, Tan R, Jiang Q, Wang Y (2019) ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function. BMC Bioinf 20:573CrossRef
15.
go back to reference Chang C-H, Kuo S-C (2004) OLERA: a semi-supervised approach for Web data extraction with visual support. IEEE Intell Syst 19(6):56–64CrossRef Chang C-H, Kuo S-C (2004) OLERA: a semi-supervised approach for Web data extraction with visual support. IEEE Intell Syst 19(6):56–64CrossRef
16.
go back to reference Crescenzi V, Mecca G, Merialdo P (2001) RoadRunner: towards automatic data extraction from large web sites. In: 27th international conference on very large data bases. Morgan Kaufmann Publishers Inc Crescenzi V, Mecca G, Merialdo P (2001) RoadRunner: towards automatic data extraction from large web sites. In: 27th international conference on very large data bases. Morgan Kaufmann Publishers Inc
17.
go back to reference Simon K, Lausen G (2005) ViPER: augmenting automatic information extraction with visual Perceptions. In: Proceedings of the 2005 ACM CIKM international conference on information and knowledge management, Bremen, Germany, pp. 381–388, October 31–November 5, 2005 Simon K, Lausen G (2005) ViPER: augmenting automatic information extraction with visual Perceptions. In: Proceedings of the 2005 ACM CIKM international conference on information and knowledge management, Bremen, Germany, pp. 381–388, October 31–November 5, 2005
18.
go back to reference Yuliana OY, Chang C-H (2016) AFIS: aligning detail-pages for full schema induction. In: Conference on technologies and applications of artificial intelligence (TAAI) Yuliana OY, Chang C-H (2016) AFIS: aligning detail-pages for full schema induction. In: Conference on technologies and applications of artificial intelligence (TAAI)
19.
go back to reference Sleiman HA, Corchuelo R (2013) TEX: an efficient and effective unsupervised web information extractor. Knowl Based Syst 39:109–123CrossRef Sleiman HA, Corchuelo R (2013) TEX: an efficient and effective unsupervised web information extractor. Knowl Based Syst 39:109–123CrossRef
20.
go back to reference Yuliana OY, Chang C-H (2018) A novel alignment algorithm for effective web data extraction from singleton-item pages. Appl Intell 48(11):4355–4370CrossRef Yuliana OY, Chang C-H (2018) A novel alignment algorithm for effective web data extraction from singleton-item pages. Appl Intell 48(11):4355–4370CrossRef
21.
go back to reference Chentoufi A, El Fatmi A, Bekri A, Benhlima S, Sabbane M (2017) Genetic algorithms and dynamic weighted sum method for RNA alignment. In: Intelligent Systems and Computer Vision (ISCV) Conference, pp 1–5 Chentoufi A, El Fatmi A, Bekri A, Benhlima S, Sabbane M (2017) Genetic algorithms and dynamic weighted sum method for RNA alignment. In: Intelligent Systems and Computer Vision (ISCV) Conference, pp 1–5
22.
go back to reference Amorim AR, Visotaky JMV, Contessoto ADC, Neves LA., De Souza RCG, Valncio RC, Zafalon GFD (2017) Performance improvement of genetic algorithm for multiple sequence alignment. In: 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp 69–72 Amorim AR, Visotaky JMV, Contessoto ADC, Neves LA., De Souza RCG, Valncio RC, Zafalon GFD (2017) Performance improvement of genetic algorithm for multiple sequence alignment. In: 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp 69–72
23.
go back to reference Lalwani S, Kumar R, Gupta N (2013) A review on particle swarm optimization variants and their applications to multiple sequence alignment. J Appl Math Bioinf 3(2):87–124 Lalwani S, Kumar R, Gupta N (2013) A review on particle swarm optimization variants and their applications to multiple sequence alignment. J Appl Math Bioinf 3(2):87–124
24.
go back to reference Jagadamba PVSL, Babu MSP, Rao AA, Rao PKS (2011) An improved algorithm for multiple sequence alignment using particle swarm optimization. In: IEEE 2nd International Conference on Software Engineering and Service Science, pp 544–547 Jagadamba PVSL, Babu MSP, Rao AA, Rao PKS (2011) An improved algorithm for multiple sequence alignment using particle swarm optimization. In: IEEE 2nd International Conference on Software Engineering and Service Science, pp 544–547
25.
go back to reference Kamal A, Mahroos M, Sayed A, Nassar A (2012) Parallel particle swarm optimization for global multiple sequence alignment. Inf Technol J 11(8):998–1006CrossRef Kamal A, Mahroos M, Sayed A, Nassar A (2012) Parallel particle swarm optimization for global multiple sequence alignment. Inf Technol J 11(8):998–1006CrossRef
26.
go back to reference Chang C-H, Chang C-H, Kayed M (2012) Fivatech2: a supervised approach to role differentiation for web data extraction from template pages. In: 26th Annual Conference of the Japanese Society for Artificial Intelligence, Special Session on Web Intelligence & Data Mining, pp 1–9 Chang C-H, Chang C-H, Kayed M (2012) Fivatech2: a supervised approach to role differentiation for web data extraction from template pages. In: 26th Annual Conference of the Japanese Society for Artificial Intelligence, Special Session on Web Intelligence & Data Mining, pp 1–9
Metadata
Title
NestMSA: a new multiple sequence alignment algorithm
Authors
Mohammed Kayed
Ahmed A. Elngar
Publication date
19-02-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 11/2020
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03206-0

Other articles of this Issue 11/2020

The Journal of Supercomputing 11/2020 Go to the issue

Premium Partner