Skip to main content

Data Mining for Bioinformatics

  • Chapter
Bioinformatics Technologies

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D., (1989) Molecular Biology of the Cell. Garland Publishing, New York and London.

    Google Scholar 

  • Altschul, S.F., Gish, W., Miller, W., Myers E.W. and Lipman, D.J., (1990) Basic local alignment search tool. J. Mol. Bio. 215: 403–410.

    Google Scholar 

  • Altschul, S.F. and Gish, G., (1996) Local alignment statistics. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 460–480.

    Google Scholar 

  • Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J., (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25: 3389–3402.

    Article  Google Scholar 

  • Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., and Murzin, A.G. (2004). SCOP database in 2004: refinements integrate structure and sequence family data. Nucl. Acid Res. 32: D226–D229.

    Google Scholar 

  • Attwood, T.K. (2002) The PRINTS database: a resource for identification of protein families.

    Google Scholar 

  • Ball, C.A., Sherlock, G., Parkinson, H., Rocca-Sera, P., Brooksbank, C., Causton, H.C., Cavalieri, D., Gaasterland, T., Hingamp, P., Holstege, F., Ringwald, M., Spellman, P., Stoeckert, C.J. Jr, Stewart, J.E., Taylor, R., Brazma, A. and Quackenbush, J. (2002) An open letter to the scientific journals. Published in Science 298(5593): 539 and Bioinformatics 18(11):1409.

    Google Scholar 

  • Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. and Wheeler, D.L. (2003) GenBank. Nucl. Acids. Res. 31: 23–27.

    Article  Google Scholar 

  • Bowie, J.U., Luthy, R. and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253: 164–170.

    Google Scholar 

  • Bowie, J.U., Zhang, K., Wilmanns, M. and Eisenberg D (1996) Three-dimensional profiles for measuring compatibility of amino acid sequence with threedimensional structure. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 598–616.

    Google Scholar 

  • Branden, C. and Tooze, J. (1999) Introduction to Protein Structure. 2nd Ed., Garland Science Publishing, New York

    Google Scholar 

  • Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C, Aach J, Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C.P., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J. and Vingron, M. (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics 29: 365–371

    Article  Google Scholar 

  • Bryant, S.H. and Lawrence, C.E. (1993) An empirical energy function for threading protein sequence through the fold motif. Proteins Struct. Funct. Genet. 16: 92–112.

    Google Scholar 

  • Burset, M. and Guigo, R. (1996) Evaluation of Gene Structure Prediction Programs. Genomics 34: 353–367

    Article  Google Scholar 

  • Chou, P.Y. and Fasman, G.D. (1978) Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 47: 45–147.

    Google Scholar 

  • Dayhoff, M.O., Schwartz, R.M. and Orcutt BC (1978) A model of evolutionary change in proteins. Atlas of Protein Science and Structure, vol. 5,supplement 3, National Biomedical Research Foundation, Washington, DC, pp. 345–351

    Google Scholar 

  • Dovichi, N.J. and Zhang, J.Z. (2001) DNA sequencing by capillary array electrophoresis. Methods Mol. Bio. 167: 225–239.

    Google Scholar 

  • Eddy, S.R. (1998) Profile Hidden Markov models. Bioinformatics 14: 755–763.

    Article  Google Scholar 

  • Felsenstein, J. (1993) PHYLIP 3.5 (phylogeny inference package). Department of Genetics, University of Washington, Seattle.

    Google Scholar 

  • Felsenstein, J. (1996) Inferring phylogeny from protein sequences by parsimony, distance and likelihood methods. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 418–427.

    Google Scholar 

  • Feng, D.F. and Doolittle, R.F. (1996) Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 368–382.

    Google Scholar 

  • Fickett, J.W. (1982) Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10: 5303–5318.

    Google Scholar 

  • Fickett, J.W. (1996) Finding genes by computer: the state of the art. Trends Genet. 12: 316–320.

    Article  Google Scholar 

  • Fickett, J.W. and Tung, C.S. (1992). Assessment of protein coding measures. Nucl. Acids Res. 20: 6641–6450.

    Google Scholar 

  • Frishman, D. and Argos, P. (1996) Incorporation of long-distance interactions into a secondary structure prediction algorithm. Protein Engineering 9: 133–142.

    Google Scholar 

  • Frishman D and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. Proteins 27: 329–335.

    Article  Google Scholar 

  • Galperin, M.Y. (2004) The Molecular Biology Database Collection: 2004 update. Nucl. Acids Res. 32: D3–D22.

    Article  Google Scholar 

  • Garnier, J., Osguthorpe, D.J. and Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120: 97–120.

    Article  Google Scholar 

  • Garnier, J., Gilbrat, J.F. and Robson, B. (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 540–553.

    Google Scholar 

  • Geer, R.C. and Sayers, E.W. (2003) Entrez: Making use of its power. Briefings in Bioinformatics 4: 1779–184

    Google Scholar 

  • Gibbs, A.J., McIntyre, G.A. (1970) The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16: 1–11.

    Article  Google Scholar 

  • Graur, D., Li, W.H. (2000) Fundamentals of molecular evolution. (2nd ed.) Sinauer Associates, Sunderland, Massachusetts.

    Google Scholar 

  • Grosse, I., Buldyrev, S.V., Stanley, H.E., Holste, D. and Herzel, H. (2000) Average mutual information of coding and noncoding DNA. Pacific Symposium on Biocomputing 5: 611–620.

    Google Scholar 

  • Guigo, R. (1999) DNA Composition, Codon Usage and Exon Prediction. In: Genetic Databases, (ed. M.J. Bishop), chap. 4, pp. 53–80, Academic Press.

    Google Scholar 

  • Henikoff, S., Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89: 10915–10919.

    Google Scholar 

  • Henikoff, S. and Henikoff, J.G. (1994) Protein family classification based on searching a database of blocks. Genomics 19: 97–107.

    Article  Google Scholar 

  • Henikoff, J.G., Greene, E.A., Pietrokovski, S. and Henikoff S (2000) Increased coverage of protein families with the blocks database servers. Nucl. Acids Res. 28: 228–230.

    Article  Google Scholar 

  • Herzel, H. and Grosse, I. (1995) Measuring correlations in symbol sequences. Physica A 216: 518–542.

    MathSciNet  Google Scholar 

  • Hawkins, J.D. (1988) A survey on intron and exon lengths. Nucl. Acids Res. 16: 9893–9908.

    Google Scholar 

  • Helt, G.A., Lewis, S., Loraine, A.E. and Rubin, G.M. (1998) BioViews: Java-based tools for genomic data visualization. Genome Res. 8: 291–305.

    Google Scholar 

  • Hoersch, S., Leroy, C., Brown, N.P., Andrade, M.A., and Sander, C. (2000) The GeneQuiz Web server: protein functional analysis through the Web. Trends in Biochem. Sci. 25: 33–35.

    Google Scholar 

  • Holm, L. and Sander, C. (1993) Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233: 23–138

    Article  Google Scholar 

  • Holm, L. and Sander, C. (1996a) Mapping the protein universe. Science 273: 595–602.

    Google Scholar 

  • Holm, L. and Sander, C. (1996b) The FSSP database: fold classification based on structure-structure alignment of proteins. Nucl. Acids Res. 24: 206–209

    Article  Google Scholar 

  • Hughey, R. and Krogh, A. (1996) Hidden Markov models for sequence analysis: Extension and the analysis of the basic method. Comput. Appl. Biosci. 12: 95–107.

    Google Scholar 

  • Huang, J.Y. and Brutlag, D.L. (2001). The eMOTIF database. Nucl. Acids Res. 29: 202–204.

    Google Scholar 

  • Johnson, M.S., May, A.C. and Ridionov, M.A., Overington JP (1996) Discrimination of common protein folds: Application of protein structure to sequence/structure comparisons. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 575–598.

    Google Scholar 

  • Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195–202.

    Article  Google Scholar 

  • Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992a) A new approach to protein fold recognition. Nature 358: 86–89.

    Article  Google Scholar 

  • Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992b) The rapid generation of mutation data matrices from protein sequences. Comp. Appl. Biosci. 8: 275–282.

    Google Scholar 

  • Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637.

    Article  Google Scholar 

  • Kim, J., Pramanik. S. and Chung, M.J. (1994). Multiple sequence alignment by simulated annealing. Comput. Appl. Biosci. 10: 419–426.

    Google Scholar 

  • Konopka, A.K. (1994) Structure and Methods: VI. Human Genome Initiative and DNA Recombination, chapter Towards Mapping Functional Domains in Indiscriminantly Sequenced Nucleic Acids: A Computational Approach. Adenine Press, Guilderland, New York.

    Google Scholar 

  • Kulikova, T., Aldebert, P., Althorpe, N., Baker, W., Bates, K. and Browne, P., van den Broek A, Cochrane G, Duggan K, Eberhardt R, Faruque N, Garcia-Pastor M, Harte N, Kanz C, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Stoehr P, Stoesser G, Tuli MA, Tzouvara K, Vaughan R, Wu D and Zhu W, Apweiler R (2004) The EMBL Nucleotide Sequence Database. Nucl. Acids Res. 32: D27–D30.

    Article  Google Scholar 

  • Lathrop, R.H., Rogers R.G. Jr., Bienkowska J., Bryant B.K.M, Buturovic L.J., Gaitatzes C., Nambudripad R., White J.V., and Smith T.F. (1998). Analysis and algorithms for protein sequence-structure alignment. Computational methods in molecular biology. S. Salzberg, D. Searls, and S. Kasif Eds. Elsevier Press. Amsterdam, Chapter 12, pp. 227–283.

    Google Scholar 

  • Lathrop, R.H., Rogers, R.G. Jr., Bienkowska, J., Bryant, B.K.M., Buturovic, L.J., Gaitatzes, C., Nambudripad, R., White, J.V., Smith, T.F. (1988) Analysis and algorithms for protein sequence-structure alignment. New Compr. Biochem. (Series title: Computational methods in molecular biology) 32: 337–355.

    Google Scholar 

  • Lemer, C.M., Rooman, M.J. and Wodak, S.J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins 23(3): 337–55.

    Article  Google Scholar 

  • Li, W. (1997) The study of correlation structures of DNA sequences: a critical review. Computer and Chemistry 21: 257–271.

    Google Scholar 

  • Li, W.H. (1997) Molecular evolution. Sinauer Associates, Sunderland, Massachusetts.

    Google Scholar 

  • Liew, A.W.C., Wu, Y., Yan, H. and Yang, M. (2004) A Study on the Effective Statistical Coding Features for Coding/Non-coding DNA Sequence Classification for Yeast, C. elegans and Human. Submitted.

    Google Scholar 

  • Lippmann, R.P. (1987) An introduction to computing with neural nets. IEEE ASSP Magazine. 4(2): 4–22.

    Article  Google Scholar 

  • Lo Conte, L., Brenner, S.E., Hubbard, T.J.P., Chothia, C. and Murzin, A. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucl. Acid Res. 30: 264–267.

    Google Scholar 

  • Lukashin, A.V., Borodovsky, M. (1998) GeneMark.hmm: new solutions for gene finding. Nucl. Acids Res. 26: 1107–1115.

    Article  Google Scholar 

  • Madej, T., Gibrat, J.F. and Bryant, S.H. (1995) Threading a database of protein cores. Proteins, 23: 356–369.

    Article  Google Scholar 

  • Markel, S. and Leon, D. (2003) Sequence Analysis in a nutshell: a guide to common tools and databases. O’Reilly and Associates, Inc., USA

    Google Scholar 

  • Martz, E. (2003) 3D molecular visualization with Protein Explorer. In: Introduction to Bioinformatics: A Theoretical and Practical Approach, (S.A. Krawetz, D.D. Womble eds.), Humana Press, Totowa, New Jersey

    Google Scholar 

  • Maizel, J.V. Jr. and Lenk, R.P. (1981) Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences. Proc. Natl. Acad. Sci. USA. 78; 7665–7669

    MathSciNet  Google Scholar 

  • Mathe, C., Sagot, M.F., Schiex, T. and Rouze, P. (2002) Current methods of gene prediction, their strengths and weakness — survey and summary. Nucl. Acids Res. 30: 4103–4117

    Article  Google Scholar 

  • McGuffin, L.J., Bryson, K. and Jones D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics 16: 404–405.

    Article  Google Scholar 

  • Mirny, L.A. and Shakhnovich, E.I. (1998) Protein structure prediction by threading-Why it works and why it does not. J. Mol. Biol. 283(2): 507–526.

    Article  Google Scholar 

  • Mirny, L.A., Finkelstein, A.V. and Shakhnovich, E.I. (2000) Statistical significance of protein structure prediction by threading. Proc. Natl. Acad. Sci. USA. 97(18): 9978–9983.

    Article  Google Scholar 

  • Miyazaki, S., Sugawara, H., Gojobori, T. and Tateno, Y. (2003) DNA Data Bank of Japan (DDBJ) in XML. Nucl. Acids. Res. 31: 13–16.

    Article  Google Scholar 

  • Miyazaki, S., Sugawara, H., Ikeo, K., Gojobori, T. and Tateno, Y. (2004). DDBJ in the stream of various biological data. Nucl. Acids. Res. 32: D31–D34.

    Article  Google Scholar 

  • Mizuguchi, K., Blundell, T.L. (2000) Analysis of conservation and substitutions of secondary structure elements within protein superfamilies. Bioinformatics 16: 1111–1119.

    Article  Google Scholar 

  • Mount, D.W. (2001) Bioinformatics — Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, New York.

    Google Scholar 

  • Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.

    Article  Google Scholar 

  • Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Bio. 48: 443–453.

    Google Scholar 

  • Notredame, C. and Higgins, D.G. (1996) SAGA: Sequence alignment by genetic algorithm. Nucl. Acids Res. 24: 1515–1524.

    Article  Google Scholar 

  • Orengo, C.A., Michie, A.D., Jones S., Jones D.T., Swindells M.B., and Thornton J.M. (1997). CATH-A Hierarchic Classification of Protein Domain Structures. Structure 5(8): 1093–1108.

    Article  Google Scholar 

  • Panchenko, A.R., Marchler-Bauer, A., Bryant, S.H. (2000) Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296(5): 1319–1331.

    Article  Google Scholar 

  • Pearl, F.M.G., Lee, D., Bray, J.E,, Sillitoe, I., Todd A.E. and Harrison A.P., Thornton J.M., and Orengo C.A. (2000). Assigning genomic sequences to CATH. Nucl. Acids Res. 28(1): 277–282. Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C, Franchini A, Tamames J, Valencia A, Ouzounis C, Sander C (1999) Automated genome sequence analysis and annotation. Bioinformatics 15: 391–412.

    Article  Google Scholar 

  • Pearson, W.R. (1990) Rapid and sensitive comparison with FASTP and FASTA. Methods Enzymol. (Series tile: Molecular evolution: computer analysis of protein and nucleic acid sequences) 183: 63–98.

    Google Scholar 

  • Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85: 2444–8.

    Google Scholar 

  • Rost, B. (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 525–539.

    Google Scholar 

  • Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232: 584–599.

    Article  Google Scholar 

  • Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19: 55–77.

    Article  Google Scholar 

  • Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations. D.E. Rumelhart and J.L. McClelland Eds. MIT Press, pp 318–362.

    Google Scholar 

  • Salamov, A.A. and Solovyev, V.V. (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiply sequence alignments. J. Mol. Biol. 247: 11–15.

    Article  Google Scholar 

  • Salamov, A.A. and Solovyev, V.V. (1997) Protein secondary structure prediction using local alignments. J. Mol. Biol. 268: 31–36.

    Article  Google Scholar 

  • Salzberg, S.L., Delcher, A.L., Kasif, S. and White, O. (1998a) Microbial gene identification using interpolated Markov models. Nucl. Acids Res. 26: 544–548.

    Article  Google Scholar 

  • Salzberg, S.L., Delcher, A.L., Fasman, K.H. and Henderson, J. (1998b) A decision tree system for finding genes in DNA. J. of Comp. Biol. 5: 667–680.

    Google Scholar 

  • Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA. 74: 5463–5467.

    Google Scholar 

  • Schwartz, R.M. and Dayhoff, M.O. (1978) Matrices for detecting distant relationships. Atlas of Protein Science and Structure, vol. 5,supplement 3, National Biomedical Research Foundation, Washington, DC, pp 353–358.

    Google Scholar 

  • Serov, V.N. and Spirov, A.V., Samsonova MG (1998) Graphical interface to the genetic network database GeNet. Bioinformatics 14: 546–547.

    Article  Google Scholar 

  • Shapiro, L. and Harris, T. (2000) Finding function through structural genomics. Current Opinion in Biotechnology 11: 31–35.

    Article  Google Scholar 

  • Siddiqui, A.S., Dengler, U. and Barton, G.J. (2001) 3Dee: A database of protein structural domains. Bioinformatics 17: 200–201.

    Article  Google Scholar 

  • Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors.

    Google Scholar 

  • Smith, T.F. and Waterman, M.S. (1981a) Identification of common molecular subsequences. J. Mol. Bio. 147: 195–197.

    Google Scholar 

  • Smith, T.F. and Waterman, M.S. (1981b). Comparison of biosequences. Adv. Appl. Math. 2: 482–489.

    Article  MathSciNet  MATH  Google Scholar 

  • Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Bateman, A. and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucl. Acids Res. 26: 320–322.

    Article  Google Scholar 

  • Staden, R. (1990) Finding protein coding regions in genomic sequences. Methods Enzymol. (Series title: Molecular evolution: computer analysis of protein and nucleic acid sequences) 183: 163–80.

    Google Scholar 

  • Staden R, McLachlan AD (1982) Codon preference and its use in identifying protein

    Google Scholar 

  • States, D.J., Boguski, M.S. (1991) Similarity and homology. In: Sequence Analysis Primer, (ed. M. Gribskov and J. Devereux), pp. 92–124, Stockton Press, New York.

    Google Scholar 

  • Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M. (1996) Phylogenetic inference. In Molecular Systematics 2nd ed., (ed. D.M. Hillis et al.), chap. 5, pp 407–514, Sinauer Associates, Sunderland, Massachusetts.

    Google Scholar 

  • Tamura, K. and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochandrail DNA in humans and chimpanzees. Mol. Bio. Evol. 10: 512–526.

    Google Scholar 

  • Tateno, Y., Imanishi, T., Miyazaki, S., Fukami-Kobayashi, K. and Saitou, N., Sugawara H, Gojobori T (2002) DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucl. Acids. Res. 30: 27–30.

    Article  Google Scholar 

  • Thanaraj, T.A. (2000) Positional characterisation of false positives from computational prediction of human splice sites. Nucl. Acids Res. 28: 744–754.

    Article  Google Scholar 

  • Thiele, R., Zimmer, R. and Lengauer, T. (1999) Protein threading by recursive dynamic programming. J. Mol. Biol. 290(3): 757–779.

    Article  Google Scholar 

  • Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22: 4673–4680.

    Google Scholar 

  • Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins, D.G. (1997) The CLUSTAL X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl. Acids Res. 25: 4876–4882.

    Article  Google Scholar 

  • Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R. (1997) Prediction of probable genes by fourier analysis of genomic sequences. Computer Applications in the Biosciences 13: 263–270.

    Google Scholar 

  • Uberbacher, E.C., Xu, Y. and Mural, R.J. (1996) Discovering and understanding genes in human DNA sequence using GRAIL. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 259–281.

    Google Scholar 

  • Vilo, J., Kapushesky, M., Kemmeren, P., Sarkans, U. and Brazma, A. (2003) Expression Profiler. In: The analysis of gene expression data: methods and software (Parmigiani G; Garrett E; Irizarry R; Zeger S L, eds.), Springer, NY.

    Google Scholar 

  • Wang Y., Zhang C.T., and Dong P. (2002). Recognizing shorter coding regions of human genes based on the statistics of stop codons. Biopolymers 63(3): 207–216.

    Article  Google Scholar 

  • Williams, G. (1999) Nucleic acid and protein sequence databases. In: Genetic Databases, (ed. M.J. Bishop), chap.2, pp. 11–37, Academic Press.

    Google Scholar 

  • Wu, S., Liew, A.W.C. and Yan, H. (2003) Cluster Analysis of Gene Expression Data Based on Self-Splitting and Merging Competitive Learning. To appear in IEEE Transactions on Information Technology in Biomedicine.

    Google Scholar 

  • Wu, Y., Liew, A.W.C., Yan, H. and Yang, M. (2003a) DB-Curve: A Novel 2D Method of DNA Sequence Visualization and Representation. Chem. Phys. Lett. 367: 170–176.

    Article  Google Scholar 

  • Wu, Y., Liew, A.W.C., Yan, H. and Yang, M. (2003b) Classification of short human exons and introns based on statistical features. Phys. Rev. E. 67(6): Art. No. 061916.

    Google Scholar 

  • Zhang, M.Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc. Natl. Acad. Sci. USA. 94: 565–568.

    Google Scholar 

  • Zhang, C.T., Wang, J. (2000) Recognition of protein coding genes in the Yeast genome at better than 95% accuracy based on the Z curve. Nucl. Acids Res. 28: 2804–2814.

    Google Scholar 

  • Zhang, R. and Zhang, C.T. (1994). Z Curves, an Intuitive Tool for Visualizing and Analyzing DNA sequences. Journal Biomolecular Structure Dynamics 11: 767–782.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Hiedelberg

About this chapter

Cite this chapter

Liew, A.W.C., Yan, H., Yang, M. (2005). Data Mining for Bioinformatics. In: Chen, YP.P. (eds) Bioinformatics Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26888-X_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-26888-X_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20873-0

  • Online ISBN: 978-3-540-26888-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics