skip to main content
10.1145/1102256.1102258acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
Article

The evolutionary computation approach to motif discovery in biological sequences

Published:25 June 2005Publication History

ABSTRACT

Finding motifs — patterns of conserved residues — within nucleotide and protein sequences is a key part of understanding function and regulation within biological systems. This paper presents a review of current approaches to motif discovery, both evolutionary computation based and otherwise, and a speculative look at the advantages of the evolutionary computation approach and where it might lead us in the future. Particular attention is given to the problem of characterising regulatory DNA motifs and the value of expressive representations for providing accurate classification.

References

  1. L. A. Anbarasu, V. Sundararajan, and N. Narayanasamy. Parallel genetic algorithm for performance-driven sequence alignment. In L. Spector, E. D. Goodman, A. Wu, et al., eds., Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), p. 1015. Morgan Kaufmann, San Francisco, California, USA, 7--11 Jul. 2001.Google ScholarGoogle Scholar
  2. T. K. Attwood. The PRINTS database: a resource for identification of protein families. Brief Bioinform, 3(3):252--63, Sep 2002.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. M. A. Azad and C. Ryan. Structural emergence with order independent representations. In E. C.-P. et al., ed., Proceedings of the 2003 Genetic and Evolutionary Computation Conference, GECCO 2003, vol. 2724 of Lecture Notes in Computer Science, pp. 1626--1638. Springer-Verlag, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. L. Bailey and W. S. Noble. Searching for statistically significant regulatory modules. Bioinformatics, 19 Suppl 2:II16-II25, Oct 2003.Google ScholarGoogle Scholar
  5. P. Baldi and S. Brunak. Bioinformatics: the machine learning approach. MIT Press, 2 edn., 2001. Baldi. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bateman, L. Coin, R. Durbin, et al. The Pfam protein families database. Nucleic Acids Res, 32(Database issue):D138--41, Jan 2004.Google ScholarGoogle Scholar
  7. H. M. Berman, J. Westbrook, Z. Feng, et al. The protein data bank. Nucleic Acids Research, 28:235--242, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  8. K. Blekas, D. I. Fotiadis, and A. Likas. Motif-based protein sequence classification using neural networks. Journal of Computational Biology, 12(1):64--82, February 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. B. Boeckmann, A. Bairoch, R. Apweiler, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, 31(1):365--70, Jan 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Brazma, I. Jonassen, I. Eidhammer, et al. Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology, 5(2):277--304, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  11. C. Bru, E. Courcelle, S. Carrre, et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res, 33 Database Issue:D212--5, Jan 2005.Google ScholarGoogle Scholar
  12. L. Cai, D. Juedes, and E. Liakhovitch. Evolutionary computation techniques for multiple sequence alignment. In Proceedings of the 2000 Congress on Evolutionary Computation CEC00, pp. 829--835. IEEE Press, La Jolla Marriott Hotel La Jolla, California, USA, 6--9 Jul. 2000.Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Carr, W. Hart, N. Krasnogor, et al. Alignment of protein structures with a memetic evolutionary algorithm. In W. B. Langdon, E. Cantú-Paz, K. Mathias, et al., eds., GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1027--1034. Morgan Kaufmann Publishers, New York, 9--13 Jul. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. C. H. Chang, A. Ratnaweera, H. Halgamuge, et al. Particle swarm optimisation for protein motif discovery. Genetic Programming and Evolvable Machines, 5(2):203--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. E. Chapman and S. J. Higgins. Regulation of Gene Expression, vol. 37 of Essays in Biochemistry. Portland Press, 2001.Google ScholarGoogle Scholar
  16. C. Chau, S. Kwong, C. Diu, et al. Optimization of HMM by a genetic algorithm. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 3, pp. 1727 -- 1730. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Chellapilla and G. B. Fogel. Multiple sequence alignment using evolutionary programming. In P. J. Angeline, Z. Michalewicz, M. Schoenauer, et al., eds., Proceedings of the Congress on Evolutionary Computation, vol. 1, pp. 445--452. IEEE Press, Mayflower Hotel, Washington D.C., USA, 6--9 Jul. 1999.Google ScholarGoogle Scholar
  18. R. Choudhury. Application of evolutionary computation for multiple sequence alignment. Project report, Stanford University, 2003.Google ScholarGoogle Scholar
  19. M. Conrad. The geometry of evolution. BioSystems, 24:61--81, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Corne, A. Meade, and R. Sibly. Evolving core promoter signal motifs. In Proceedings of the 2001 Congress on Evolutionary Computation CEC2001, pp. 1162--1169. IEEE Press, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 27--30 May 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Durbin, S. R. Eddy, A. Krogh, et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1999.Google ScholarGoogle Scholar
  22. L. Falquet, M. Pagni, P. Bucher, et al. The PROSITE database, its status in 2002. Nucleic Acids Research, 30:235--238, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  23. M. Frith, U. Hansen, and Z. Weng. Detection of cis-element clusters in higher eukaryotic DNA. Bioinformatics, 17(10):878--89, Oct 2001.Google ScholarGoogle ScholarCross RefCross Ref
  24. D. E. Goldberg, K. Deb, H. Kargupta, et al. Rapid accurate optimization of difficult problems using fast messy genetic algorithms. In S. Forrest, ed., Proceedings of The Fifth International Conference On Genetic Algorithms. Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. R. Gonzalez, C. M. Izquierdo, and J. Seijas. Multiple protein sequence comparison by genetic algorithms. In S. K. Rogers, D. B. Fogel, J. C. Bezdek, et al., eds., Proceedings of SPIE, vol. 3390 of Applications and Science of Computational Intelligence, pp. 99--102. 1998.Google ScholarGoogle Scholar
  26. R. Goodacre and D. B. Kell. Evolutionary computation for the interpretation of metabolome data. In G. G. Harrigan and R. Goodacre, eds., Metabolic profiling: its role in biomarker discovery and gene function analysis, pp. 239--256. Kluwer, Boston, 2003.Google ScholarGoogle Scholar
  27. K. Hanada, T. Yokoyama, and T. Shimizu. Multiple sequence alignment by genetic algorithm. Genome Informatics, 11:317--318, 2000.Google ScholarGoogle Scholar
  28. S. Handley. Predicting whether or not a nucleic acid sequence is an E. coli promoter region using genetic programming. In Proceedings of the First International Symposium on Intelligence in Neural and Biological Systems INBS-95, pp. 122--127. IEEE Computer Society Press, Herndon, Virginia, USA, 29--31 May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Heddad, M. Brameier, and M. MacCallum. Evolving regular expression-based sequence classifiers for protein nuclear localisation. In G. R. Raidl, S. Cagnoni, J. Branke, et al., eds., Applications of Evolutionary Computing, EvoWorkshops2004, vol. 3005 of LNCS, pp. 31--40. Springer Verlag, Coimbra, Portugal, 5--7 Apr. 2004.Google ScholarGoogle Scholar
  30. S. Henikoff, J. Henikoff, and S. Pietrokovski. Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics, 15(6):471--9, Jun 1999.Google ScholarGoogle ScholarCross RefCross Ref
  31. L. S. Ho and J. Rajapakse. High sensitivity technique for translation initiation site detection. In Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 153--159. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  32. J.-T. Horng, C.-M. Lin, B.-J. Liu, et al. Using genetic algorithms to solve multiple sequence alignments. In D. Whitley, D. Goldberg, E. Cantu-Paz, et al., eds., Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2000), pp. 883--890. Morgan Kaufmann, Las Vegas, Nevada, USA, 10--12 Jul. 2000.Google ScholarGoogle Scholar
  33. D. Howard and K. Benson. Evolutionary computation method for pattern recognition of cis-acting sites. Biosystems, 72(1--2):19--27, Nov. 2003. Special Issue on Computational Intelligence in Bioinformatics.Google ScholarGoogle Scholar
  34. Y.-J. Hu. Biopattern discovery by genetic programming. In J. K. et al, ed., Proceedings Genetic Programming 1998, pp. 152--157. Morgan Kaufmann, 1998.Google ScholarGoogle Scholar
  35. K. Huang, J. Louis, L. Donaldson, et al. Solution structure of the MEF2A-DNA complex: structural basis for the modulation of DNA bending and specificity by MADS-box transcription factors. EMBO J., 19:2615--2628, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Ishikawa, T. Toya, Y. Totoki, et al. Parallel iterative aligner with genetic algorithm. In Genome Informatics Workshop IV, pp. 13--22. 1993.Google ScholarGoogle Scholar
  37. M. Isokawa, M. Wayama, and T. Shimizu. Multiple sequence alignment using a genetic algorithm. Genome Informatics, 7:176--177, 1996.Google ScholarGoogle Scholar
  38. S. A. Kauffman. Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology, 22:437--467, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  39. J. Koza, F. Bennett, and D. Andre. Using programmatic motifs and genetic programming to classify protein sequences as to extracellular and membrane cellular location. In V. W. Porto, N. Saravanan, D. Waagen, et al., eds., Evolutionary Programming VII: Proceedings of the Seventh Annual Conference on Evolutionary Programming, vol. 1447 of LNCS. Springer-Verlag, Mission Valley Marriott, San Diego, California, USA, 25--27 Mar. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. R. Koza and D. Andre. Automatic discovery of protein motifs using genetic programming. In X. Yao, ed., Evolutionary Computation: Theory and Applications. World Scientific, Singapore, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  41. S. Kwong, C. W. Chaua, K. F. Manb, et al. Optimisation of HMM topology and its model parameters by genetic algorithms. Pattern Recognition, 34(2):509--522, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  42. C.-C. Lai and S.-W. Chung. Multiple DNA sequences alignment by means of genetic algorithm. In Design and application of hybrid intelligent systems, pp. 224--232. IOS Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Latchman. Eukaryotic Transcription Factors. Academic Press, 3 edn., 1999.Google ScholarGoogle Scholar
  44. S. Leopold. An alignment graph based evolutionary algorithm for the multiple sequence alignment problem. Master's thesis, Vienna University of Technology, Institute of Computer Graphics and Algorithms, 2004.Google ScholarGoogle Scholar
  45. A. M. Lesk. Introduction to Protein Architecture. Oxford University Press, 2000.Google ScholarGoogle Scholar
  46. O. Littlefield, Y. Korkhin, and P. Sigler. The structural basis for the oriented assembly of a TBP/TFB/promoter complex. Proc. Natl. Acad. Sci. USA, 96:13668--13673, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  47. M. A. Lones. Enzyme Genetic Programming: Modelling Biological Evolvability in Genetic Programming. Ph.D. thesis, Department of Electronics, University of York, 2003.Google ScholarGoogle Scholar
  48. N. J. Mulder, R. Apweiler, T. K. Attwood, et al. InterPro, progress and status in 2005. Nucleic Acids Res, 33 Database Issue:D201--5, Jan 2005.Google ScholarGoogle Scholar
  49. C. L. Nehaniv. Editorial for special issue on evolvability. BioSystems, 69:77--81, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  50. H. D. Nguyen, I. Yoshihara, K. Yamamori, et al. A parallel hybrid genetic algorithm for multiple protein sequence alignment. In D. B. Fogel, M. A. El-Sharkawi, X. Yao, et al., eds., Proceedings of the 2002 Congress on Evolutionary Computation CEC2002, pp. 309--314. IEEE Press, 2002.Google ScholarGoogle Scholar
  51. H. D. Nguyen, I. Yoshihara, Y. Yamamori, et al. Improved GA-based method for multiple protein sequence alignment. In R. Sarker, R. Reynolds, H. Abbass, et al., eds., Proceedings of the 2003 Congress on Evolutionary Computation CEC2003, pp. 1826--1832. IEEE Press, Canberra, 8--12 Dec. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  52. C. Notredame and D. G. Higgins. SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res., 24(8):1515--1524, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  53. B. Olsson. Using evolutionary algorithms in the design of protein fingerprints. In W. Banzhaf, J. Daida, A. E. Eiben, et al., eds., Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2, pp. 1636--1642. Morgan Kaufmann, Orlando, Florida, USA, 13--17 Jul. 1999.Google ScholarGoogle Scholar
  54. A. Pedersen, P. Baldi, Y. Chauvin, et al. The biology of eukaryotic promoter prediction---a review. Comput Chem, 23(3--4):191--207, Jun 1999.Google ScholarGoogle Scholar
  55. D. Pribnow. Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. Proc. Natl. Acad. Sci. USA, 72:784--788, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  56. B. J. Ross. The evolution of stochastic regular motifs for protein sequences. New Generation Computing, 20(2):187--213, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. L. Sheneman and J. A. Foster. Evolving better multiple sequence alignments. In K. Deb, R. Poli, W. Banzhaf, et al., eds., Genetic and Evolutionary Computation -- GECCO-2004, Part I, vol. 3102 of Lecture Notes in Computer Science, pp. 449--460. Springer-Verlag, Seattle, WA, USA, 26--30 Jun. 2004.Google ScholarGoogle Scholar
  58. L. J. Sheneman and J. A. Foster. Evolving guide trees in progressive multiple sequence alignment. In A. M. Barry, ed., GECCO 2003: Proceedings of the Bird of a Feather Workshops, Genetic and Evolutionary Computation Conference, pp. 89--91. AAAI, Chigaco, 11 Jul. 2003.Google ScholarGoogle Scholar
  59. C. Shyu, L. Sheneman, and J. A. Foster. Multiple sequence alignment with evolutionary computation. Genetic Programming and Evolvable Machines, 5(2):121--144, 2004. Prepublication Date: 03/23/2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Slimane, G. Venturini, J. P. A. de Beauville, et al. Optimizing hidden Markov models with a genetic algorithm. In J.-M. Alliot, E. Lutton, E. M. A. Ronald, et al., eds., Artificial Evolution, vol. 1063 of Lecture Notes in Computer Science, pp. 384--396. Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. G. Stormo. DNA binding sites: representation and discovery. Bioinformatics, 16(1):16--23, January 2000.Google ScholarGoogle ScholarCross RefCross Ref
  62. J. Thompson, D. Higgins, and T. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 22(22):4673--80, Nov 1994.Google ScholarGoogle ScholarCross RefCross Ref
  63. R. Thomsen. Evolving the topology of hidden markov models using evolutionary algorithms. In J. J. M. Guervós, P. Adamidis, H.-G. Beyer, et al., eds., Parallel Problem Solving from Nature - PPSN VII, vol. 2439 of Lecture Notes in Computer Science, pp. 861--870. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. R. Thomsen, G. B. Fogel, and T. Krink. A clustal alignment improver using evolutionary algorithms. In D. B. Fogel, X. Yao, G. Greenwood, et al., eds., Proceedings of the Fourth Congress on Evolutionary Computation (CEC-2002), vol. 1, pp. 121--126. 2002.Google ScholarGoogle Scholar
  65. E. E. Vallejo and F. Ramos. Evolving Turing machines for biosequences recognition and analysis. In J. F. Miller, M. Tomassini, P. L. Lanzi, et al., eds., Genetic Programming, Proceedings of EuroGP'2001, vol. 2038 of LNCS, pp. 192--203. Springer-Verlag, Lake Como, Italy, 18--20 Apr. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. E. E. Vallejo and F. Ramos. Evolutionary two-dimensional DNA sequence alignment. In E. Cantú-Paz, J. A. Foster, K. Deb, et al., eds., Genetic and Evolutionary Computation -- GECCO-2003, vol. 2723 of LNCS, pp. 429--430. Springer-Verlag, 12-16 Jul. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. A. Vanet, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithmical methods for identifying them. Res. Microbiol., 150:779--799, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  68. M. Wayama, K. Takahashi, and T. Shimizu. An approach to amino acid sequence alignment using a genetic algorithm. Genome Informatics, 6:122--123, 1995.Google ScholarGoogle Scholar
  69. T. Werner. Models for prediction and recognition of eukaryotic promoters. Mamm Genome, 10(2):168--75, Feb 1999.Google ScholarGoogle ScholarCross RefCross Ref
  70. T. Werner. The state of the art of mammalian promoter recognition. Brief Bioinform, 4(1):22--30, Mar 2003.Google ScholarGoogle ScholarCross RefCross Ref
  71. K.-J. Won, A. Prügel-Bennett, and A. Krogh. Training HMM structure with genetic algorithm for biological sequence analysis. Bioinformatics, 20(18):3613--9, Dec 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. T. Yada, M. Ishikawa, H. Tanaka, et al. Extraction of hidden Markov model representations of signal patterns in DNA sequences. In Pacific Symposium on Biocomputing, pp. 686--96. 1996.Google ScholarGoogle Scholar
  73. C. Zhang and A. K. Wong. A genetic algorithm for multiple molecular sequence alignment. Comput Appl Biosci., 13(6):565--581, 1997.Google ScholarGoogle Scholar

Index Terms

  1. The evolutionary computation approach to motif discovery in biological sequences

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation
                June 2005
                431 pages
                ISBN:9781450378000
                DOI:10.1145/1102256

                Copyright © 2005 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 25 June 2005

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate1,669of4,410submissions,38%

                Upcoming Conference

                GECCO '24
                Genetic and Evolutionary Computation Conference
                July 14 - 18, 2024
                Melbourne , VIC , Australia

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader