Skip to main content
Top
Published in: Soft Computing 10/2011

01-10-2011 | Focus

A genetic programming method for protein motif discovery and protein classification

Authors: Denise Fukumi Tsunoda, Alex Alves Freitas, Heitor Silvério Lopes

Published in: Soft Computing | Issue 10/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Proteins can be grouped into families according to some features such as hydrophobicity, composition or structure, aiming to establish common biological functions. This paper presents MAHATMA—memetic algorithm-based highly adapted tool for motif ascertainment—a system that was conceived to discover features (particular sequences of amino acids, or motifs) that occur very often in proteins of a given family but rarely occur in proteins of other families. These features can be used for the classification of unknown proteins, that is, to predict their function by analyzing their primary structure. Experiments were done with a set of enzymes extracted from the Protein Data Bank. The heuristic method used was based on genetic programming using operators specially tailored for the target problem. The final performance was measured using sensitivity, specificity and hit rate. The best results obtained for the enzyme dataset suggest that the proposed evolutionary computation method is effective in finding predictive features (motifs) for protein classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
go back to reference Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming: an introduction. Morgan Kaufmann, San Mateo, CAMATH Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming: an introduction. Morgan Kaufmann, San Mateo, CAMATH
go back to reference Branden CI, Tooze J (1999) Introduction to protein structure. Garland, New York Branden CI, Tooze J (1999) Introduction to protein structure. Garland, New York
go back to reference desJardins M, Karp PD, Krummenacker M, Lee TJ (1997) Prediction of enzyme classification from protein sequence without the use of sequence similarity. ISMB-97 Proceedings, pp 92–99 desJardins M, Karp PD, Krummenacker M, Lee TJ (1997) Prediction of enzyme classification from protein sequence without the use of sequence similarity. ISMB-97 Proceedings, pp 92–99
go back to reference Eiben AE, Smith JE (2003) Introduction to evolutionary computing, 2nd printing. Natural computing series. Springer, Berlin Eiben AE, Smith JE (2003) Introduction to evolutionary computing, 2nd printing. Natural computing series. Springer, Berlin
go back to reference Freitas AA, de Carvalho ACPLF (2007) A tutorial on hierarchical classification with applications in bioinformatics. In: Taniar D (ed) Research and trends in data mining technologies and applications, Idea Group, pp 175–208 Freitas AA, de Carvalho ACPLF (2007) A tutorial on hierarchical classification with applications in bioinformatics. In: Taniar D (ed) Research and trends in data mining technologies and applications, Idea Group, pp 175–208
go back to reference Freitas AA, Wieser DC, Apweiler R (2010) On the importance of comprehensible classification models for protein function prediction. IEEE/ACM Trans Comput Biol Bioinform 7(1):172–182. doi:10.1109/TCBB.2008.47 CrossRef Freitas AA, Wieser DC, Apweiler R (2010) On the importance of comprehensible classification models for protein function prediction. IEEE/ACM Trans Comput Biol Bioinform 7(1):172–182. doi:10.​1109/​TCBB.​2008.​47 CrossRef
go back to reference Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, ReadingMATH Goldberg DE (1989) Genetic algorithms in search optimization and machine learning. Addison-Wesley, ReadingMATH
go back to reference Hsu WH (2009) Genetic programming. In: Wang J (ed) Encyclopedia of data warehousing and mining, 2nd edn. Idea Group Inc. Global, pp 926–931 Hsu WH (2009) Genetic programming. In: Wang J (ed) Encyclopedia of data warehousing and mining, 2nd edn. Idea Group Inc. Global, pp 926–931
go back to reference Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CAF, Knudsen S, Krogh A, Valencia A, Brunak S (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319:1257–1265. doi:10.1016/S0022-2836(02)00379-0 CrossRef Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CAF, Knudsen S, Krogh A, Valencia A, Brunak S (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319:1257–1265. doi:10.​1016/​S0022-2836(02)00379-0 CrossRef
go back to reference Kaminska KH, Milanowska K, Bujnicki JM (2009) The basics of protein sequence analysis. In: Bujnicki JM (ed) Prediction of protein structures, functions, and interactions, pp 1–38. doi:10.1002/9780470741894 Kaminska KH, Milanowska K, Bujnicki JM (2009) The basics of protein sequence analysis. In: Bujnicki JM (ed) Prediction of protein structures, functions, and interactions, pp 1–38. doi:10.​1002/​9780470741894
go back to reference Koza JR (1992) Genetic programming—on the programming of computers by means of natural selection. MIT Press, CambridgeMATH Koza JR (1992) Genetic programming—on the programming of computers by means of natural selection. MIT Press, CambridgeMATH
go back to reference Koza JR (1994) Genetic programming ii: automatic discovery of reusable programs. MIT Press, CambridgeMATH Koza JR (1994) Genetic programming ii: automatic discovery of reusable programs. MIT Press, CambridgeMATH
go back to reference Larose DT (2006) Data mining methods and models. Wiley and Sons, Hoboken, NJMATH Larose DT (2006) Data mining methods and models. Wiley and Sons, Hoboken, NJMATH
go back to reference Lehninger AL, Nelson DL, Cox MM (1998) Principles of biochemistry, 2nd edn. Worth Publishers, New York Lehninger AL, Nelson DL, Cox MM (1998) Principles of biochemistry, 2nd edn. Worth Publishers, New York
go back to reference Lesk AM (2001) Introduction to protein architecture. Oxford University Press Inc., New York Lesk AM (2001) Introduction to protein architecture. Oxford University Press Inc., New York
go back to reference Lopes HS (1996) Analogia e Aprendizado Evolucionário: uma aplicação em diagnóstico clínico. PhD thesis, Brazil (in Portuguese) Lopes HS (1996) Analogia e Aprendizado Evolucionário: uma aplicação em diagnóstico clínico. PhD thesis, Brazil (in Portuguese)
go back to reference Moscato P (1989) On evolution, search, optimization, genetic algorithms and martial arts: towards memetic algorithms. Technical report Caltech Concurrent Computation Program, No. 826, CA Moscato P (1989) On evolution, search, optimization, genetic algorithms and martial arts: towards memetic algorithms. Technical report Caltech Concurrent Computation Program, No. 826, CA
go back to reference Nisbet R, Elder J, Miner G (2009) Statistical analysis and data mining applications. Elsevier, San Diego, CAMATH Nisbet R, Elder J, Miner G (2009) Statistical analysis and data mining applications. Elsevier, San Diego, CAMATH
go back to reference Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA
go back to reference Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (2003) Automatic prediction of protein function. CMLS Cell Mol Life Sci 60:2637–2650CrossRef Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (2003) Automatic prediction of protein function. CMLS Cell Mol Life Sci 60:2637–2650CrossRef
go back to reference Silla Jr CN, Freitas AA (2010) A survey of hierarchical classification across different application domains. Data Min Knowl Discov (in press) Silla Jr CN, Freitas AA (2010) A survey of hierarchical classification across different application domains. Data Min Knowl Discov (in press)
go back to reference Tsunoda DF, Lopes HS (2005) Automatic motif discovery in an enzyme database using a genetic algorithm-based approach. Soft Comput Fusion Found Methodol Appl 10(4):325–330. doi:10.1007/s00500-005-0490-z Tsunoda DF, Lopes HS (2005) Automatic motif discovery in an enzyme database using a genetic algorithm-based approach. Soft Comput Fusion Found Methodol Appl 10(4):325–330. doi:10.​1007/​s00500-005-0490-z
go back to reference Tsunoda DF, Freitas AA, Lopes HS (2009) MAHATMA: a genetic programming-based tool for protein classification. In: Proc 2009 ninth international conference on intelligent systems design and applications (ISDA-09), IEEE Press, pp 1136–1142 Tsunoda DF, Freitas AA, Lopes HS (2009) MAHATMA: a genetic programming-based tool for protein classification. In: Proc 2009 ninth international conference on intelligent systems design and applications (ISDA-09), IEEE Press, pp 1136–1142
go back to reference Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Mateo, CA Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Mateo, CA
Metadata
Title
A genetic programming method for protein motif discovery and protein classification
Authors
Denise Fukumi Tsunoda
Alex Alves Freitas
Heitor Silvério Lopes
Publication date
01-10-2011
Publisher
Springer-Verlag
Published in
Soft Computing / Issue 10/2011
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-010-0624-9

Other articles of this Issue 10/2011

Soft Computing 10/2011 Go to the issue

Premium Partner